Pyscripter and printing line of CSV file - python

Here is my code:
import csv,math
StudentsTXT=open('students.txt')
csv_students=csv.reader(StudentsTXT, delimiter=',')
am =input('please search ip adress')
I also tried this code :
import csv,math
StudentsTXT=open('students.txt')
csv_students=csv.reader(StudentsTXT, delimiter=',')
print(csv_students['013'])
But this one comes up with error saying:object has no attribute'getitem'
How can I make one of these codes to print out line of my text file already translated into csv.
The Text file looks something like this
010,Jane,Jones,30/11/2001,32|Ban Road,H.Num:899 421 223,Female,11Ca,JJ#school.com
012,John,Johnson,23/09/2001,43|Can Street,H.Num:999 123 323,Male,11Ca,JoJo#school.com
025,Jack,Jackson,29/02/2002,61|Cat grove,H.Num:998 434 656,Male,11Ca,JaJa#school.com
I want to be able to search for any of the names of any students and then print all information about them .

Question: i want it to print things like for example:
002,John,Smith,01/01/2001,1 example road,000 000 000,male,11ca,js#school.com instead of: 001, john 002,jane
Change to csv.DictReader, for instance:
Note: Assuming you have NO Headers in the CSV File!
with open('students.txt') as fh:
# Define Header Fieldnames
fieldnames = ['Name', 'SName', 'ID', 'Date', 'Address', 'Kontakt', 'F/M', 'Class', 'EMail']
csv_students = csv.DictReader(fh, fieldnames=fieldnames)
# Iterate csv_students
for student in csv_students:
# Create a List of Data Ordered using Fieldnames
student_list = [student[f] for f in fieldnames]
print('{}'.format(', '.join(student_list)))
# Print only Part of the Fields
print('{s[ID]}: {s[Name]} {s[Address]}'.format(s=student))
Output:
Jane, Jones, 010, 30/11/2001, 32|Ban Road, H.Num:899 421 223, Female, 11Ca, JJ#school.com
010: Jane 32|Ban Road
John, Johnson, 012, 23/09/2001, 43|Can Street, H.Num:999 123 323, Male, 11Ca, JoJo#school.com
012: John 43|Can Street
Jack, Jackson, 025, 29/02/2002, 61|Cat grove, H.Num:998 434 656, Male, 11Ca, JaJa#school.com
025: Jack 61|Cat grove
Tested with Python: 3.4.2

Related

How do I transform a non-CSV text file into a CSV using Python/Pandas?

I have a text file that looks like this:
Id Number: 12345678
Location: 1234561791234567090-8.9
Street: 999 Street AVE
Buyer: john doe
Id Number: 12345688
Location: 3582561791254567090-8.9
Street: 123 Street AVE
Buyer: Jane doe # buyer % LLC
Id Number: 12345689
Location: 8542561791254567090-8.9
Street: 854 Street AVE
Buyer: Jake and Bob: Owner%LLC: Inc
I'd like the file to look like this:
Id Number
Location
Street
Buyer
12345678
1234561791234567090-8.9
999 Street AVE
john doe
12345688
3582561791254567090-8.9
123 Street AVE
Jane doe # buyer % LLC
12345689
8542561791254567090-8.9
854 Street AVE
Jake and Bob: Owner%LLC: Inc
I have tried the following:
# 1 Read text file and ignore bad lines (lines with extra colons thus reading as extra fields).
tr = pd.read_csv('C:\\File Path\\test.txt', sep=':', header=None, error_bad_lines=False)
# 2 Convert into a dataframe/pivot table.
ndf = pd.DataFrame(tr.pivot(index=None, columns=0, values=1))
# 3 Clean up the pivot table to remove NaNs and reset the index (line by line).
nf2 = ndf.apply(lambda x: x.dropna().reset_index(drop=True))
Here is where got the last line (#3): https://stackoverflow.com/a/62481057/10448224
When I do the above and export to CSV the headers are arranged like the following:
(index)
Street
Buyer
Id Number
Location
The data is filled in nicely but at some point the Buyer field becomes inaccurate but the rest of the fields are accurate through the entire DF.
My guesses:
When I run #1 part of my script I get the following errors 507 times:
b'Skipping line 500: expected 2 fields, saw 3\nSkipping line 728: expected 2 fields, saw 3\
At the tail end of the new DF I am missing exactly 507 entries for the Byer field. So I think when I drop my bad lines, the field is pushing my data up.
Pain Points:
The Buyer field will sometimes have extra colons and other odd characters. So when I try to use a colon as a delimiter I run into problems.
I am new to Python and I am very new to using functions. I primarily use Pandas to manipulate data at a somewhat basic level. So in the words of the great Michael Scott: "Explain it to me like I'm five." Many many thanks to anyone willing to help.
Here's what I meant by reading in and using split. Very similar to other answers. Untested and I don't recall if inputline include eol, so I stripped it too.
with open('myfile.txt') as f:
data = [] # holds database
record = {} # holds built up record
for inputline in f:
key,value = inputline.strip().split(':',1)
if key == "Id Number": # new record starting
if len(record):
data.append(record) # write previous record
record = {}
record.update({key:value})
if len(record):
data.append(record) # out final record
df = pd.DataFrame(data)
This is a minimal example that demonstrates the basics:
cat split_test.txt
Id Number: 12345678
Location: 1234561791234567090-8.9
Street: 999 Street AVE
Buyer: john doe
Id Number: 12345688
Location: 3582561791254567090-8.9
Street: 123 Street AVE
Buyer: Jane doe # buyer % LLC
Id Number: 12345689
Location: 8542561791254567090-8.9
Street: 854 Street AVE
Buyer: Jake and Bob: Owner%LLC: Inc
import csv
with open("split_test.txt", "r") as f:
id_val = "Id Number"
list_var = []
for line in f:
split_line = line.strip().split(':')
print(split_line)
if split_line[0] == id_val:
d = {}
d[split_line[0]] = split_line[1]
list_var.append(d)
else:
d.update({split_line[0]: split_line[1]})
list_var
[{'Id Number': ' 12345689',
'Location': ' 8542561791254567090-8.9',
'Street': ' 854 Street AVE',
'Buyer': ' Jake and Bob'},
{'Id Number': ' 12345678',
'Location': ' 1234561791234567090-8.9',
'Street': ' 999 Street AVE',
'Buyer': ' john doe'},
{'Id Number': ' 12345688',
'Location': ' 3582561791254567090-8.9',
'Street': ' 123 Street AVE',
'Buyer': ' Jane doe # buyer % LLC'}]
with open("split_ex.csv", "w") as csv_file:
field_names = list_var[0].keys()
csv_writer = csv.DictWriter(csv_file, fieldnames=field_names)
csv_writer.writeheader()
for row in list_var:
csv_writer.writerow(row)
I would try reading the file line by line, splitting the key-value pairs into a list of dicts to look something like:
data = [
{
"Id Number": 12345678,
"Location": 1234561791234567090-8.9,
...
},
{
"Id Number": ...
}
]
# easy to create the dataframe from here
your_df = pd.DataFrame(data)

Python text extractor & organizer

I need to extract details of some costumers and save it in a new database all I have its only a txt file so we are talking about 5000 costumers or more that txt file its saved all in this way:
first and last name
NAME SURNAME
zip country n. phone number mobile
United Kingdom +1111111111
e-mail
email#email.email
guest first and last name 1°
NAME SURNAME
guest first and last name 2°
NAME SURNAME
name address city province
NAME SURNAME London London
zip
AAAAA
Cancellation of the reservation.
Since the file is always like this I was thinking there could be a way to scrape so I did some research as far, this is what I have came up with but not really what I need:
with open('input.txt') as infile, open('output.txt', 'w') as outfile:
copy = False
for line in infile:
if (line.find("first and last name") != -1):
copy = True
elif (line.find("Cancellation of the reservation.") != -1):
copy = False
elif copy:
outfile.write(line)
The codes works but simply reads the file from a line to other and copies the content I need something that will copy the content in an other format like this I am able to uploaded on the database the format I need is this:
first and last name | zip country n. phone number mobile|e-mail|guest first and last name 1°|name address city province|zip
So in this case I need it like this:
NAME SURNAME | United Kingdom +1111111111|email#email.email|NAME SURNAME London London |AAAAA
For every line in the output.txt
these are some good scraping tools for what you're looking to do:
data = '''first and last name
NAME SURNAME
zip country n. phone number mobile
United Kingdom +1111111111
e-mail
email#email.email
guest first and last name 1
NAME SURNAME
guest first and last name 2
NAME SURNAME
name address city province
NAME SURNAME London London
zip
AAAAA
Cancellation of the reservation.
'''
# split on space, convert to list
ldata = data.split()
# strip leading and trailing white space from each item
ldata = [i.strip() for i in ldata]
# split on line break, convert to list
ndata = data.split('\n')
ndata = [i.strip() for i in ndata]
#convert list to string
sdata = ' '.join(ldata)
print ldata
print ndata
print sdata
# two examples of split after, split before
name_surname = sdata.split('first and last name')[1].split('zip')[0]
print name_surname
country_phone = sdata.split('mobile')[1].split('e-mail')[0]
print country_phone
>>>
['first', 'and', 'last', 'name', 'NAME', 'SURNAME', 'zip', 'country', 'n.', 'phone', 'number', 'mobile', 'United', 'Kingdom', '+1111111111', 'e-mail', 'email#email.email', 'guest', 'first', 'and', 'last', 'name', '1', 'NAME', 'SURNAME', 'guest', 'first', 'and', 'last', 'name', '2', 'NAME', 'SURNAME', 'name', 'address', 'city', 'province', 'NAME', 'SURNAME', 'London', 'London', 'zip', 'AAAAA', 'Cancellation', 'of', 'the', 'reservation.']
['first and last name', 'NAME SURNAME', 'zip country n. phone number mobile', 'United Kingdom +1111111111', 'e-mail', 'email#email.email', 'guest first and last name 1', 'NAME SURNAME', 'guest first and last name 2', 'NAME SURNAME', 'name address city province', 'NAME SURNAME London London', 'zip', 'AAAAA', 'Cancellation of the reservation.', '']
first and last name NAME SURNAME zip country n. phone number mobile United Kingdom +1111111111 e-mail email#email.email guest first and last name 1 NAME SURNAME guest first and last name 2 NAME SURNAME name address city province NAME SURNAME London London zip AAAAA Cancellation of the reservation.
NAME SURNAME
United Kingdom +1111111111

read txt file into dictionary

I have the following type of document, where each person might have a couple of names and an associated description of features:
New person
name: ana
name: anna
name: ann
feature: A 65-year old woman that has no known health issues but has a medical history of Schizophrenia.
New person
name: tom
name: thomas
name: thimoty
name: tommy
feature: A 32-year old male that is known to be deaf.
New person
.....
What I would like is to read this file in a python dictionary, where each new person is id-ed.
i.e. Person with ID 1 will have the names ['ann','anna','ana']
and will have the feature ['A 65-year old woman that has no known health issues but has a medical history of Schizophrenia.' ]
Any suggestions?
Assuming that your input file is lo.txt. It can be added to dictionary this way:
file = open('lo.txt')
final_data = []
feature = []
names = []
for line in file.readlines():
if ("feature") in line:
data = line.replace("\n","").split(":")
feature=data[1]
final_data.append({
'names': names,
'feature': feature
})
names = []
feature = []
if ("name") in line:
data = line.replace("\n","").split(":")
names.append(data[1])
print final_data
Something like this might work
result = {}
f = open("document.txt")
contents = f.read()
info = contents.split('==== new person ===')
for i in range(len(info)):
info[i].split('\n')
names = []
features = []
for j in range(len(info[i])):
info[i][j].split(':')
if info[i][j][0] == 'name':
names.append(info[i][j][1])
else:
features.append(info[i][j][1])
result[i] = {'names': names,'features': features}
print(result)
This should give you something like:
{0: {'names': ['ana', 'anna', 'ann'], features:['...', '...']}}
e.t.c
Here is code that may work for you:
f = open("documents.txt").readlines()
f = [i.strip('\n') for i in f]
final_condition = f[len(f)-1]
f.remove(final_condition)
names = [i.split(":")[1] for i in f]
the_dict = {}
the_dict["names"] = names
the_dict["features"] = final_condition
print the_dict
All it does is split the names at ":" and take the last element of the resulting list (the names) and keep it for the list names.

File content into dictionary

I need to turn this file content into a dictionary, so that every key in the dict is a name of a movie and every value is the name of the actors that plays in it inside a set.
Example of file content:
Brad Pitt, Sleepers, Troy, Meet Joe Black, Oceans Eleven, Seven, Mr & Mrs Smith
Tom Hanks, You have got mail, Apollo 13, Sleepless in Seattle, Catch Me If You Can
Meg Ryan, You have got mail, Sleepless in Seattle
Diane Kruger, Troy, National Treasure
Dustin Hoffman, Sleepers, The Lost City
Anthony Hopkins, Hannibal, The Edge, Meet Joe Black, Proof
This should get you started:
line = "a, b, c, d"
result = {}
names = line.split(", ")
actor = names[0]
movies = names[1:]
result[actor] = movies
Try the following:
res_dict = {}
with open('my_file.txt', 'r') as f:
for line in f:
my_list = [item.strip() for item in line.split(',')]
res_dict[my_list[0]] = my_list[1:] # To make it a set, use: set(my_list[1:])
Explanation:
split() is used to split each line to form a list using , separator
strip() is used to remove spaces around each element of the previous list
When you use with statement, you do not need to close your file explicitly.
[item.strip() for item in line.split(',')] is called a list comprehension.
Output:
>>> res_dict
{'Diane Kruger': ['Troy', 'National Treasure'], 'Brad Pitt': ['Sleepers', 'Troy', 'Meet Joe Black', 'Oceans Eleven', 'Seven', 'Mr & Mrs Smith'], 'Meg Ryan': ['You have got mail', 'Sleepless in Seattle'], 'Tom Hanks': ['You have got mail', 'Apollo 13', 'Sleepless in Seattle', 'Catch Me If You Can'], 'Dustin Hoffman': ['Sleepers', 'The Lost City'], 'Anthony Hopkins': ['Hannibal', 'The Edge', 'Meet Joe Black', 'Proof']}

Python 3: XML Tag Value not being written to csv file

My python 3 script takes an xml file and creates a csv file.
Small excerpt of xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<metadata>
<dc>
<title>Golden days for boys and girls, 1895-03-16, v. XVI #17</title>
<subject>Children's literature--Children's periodicals</subject>
<description>Archives & Special Collections at the Thomas J. Dodd Research Center, University of Connecticut Libraries</description>
<publisher>James Elverson, 1880-</publisher>
<date>1895-06-15</date>
<type>Text | periodicals</type>
<format>image/jp2</format>
<handle>http://hdl.handle.net/11134/20002:860074494</handle>
<accessionNumber/>
<barcode/>
<identifier>20002:860074494 | local: 868010272 | local: 997186613502432 | local: 39153019382870 | hdl:  | http://hdl.handle.net/11134/20002:860074494</identifier>
<rights>These Materials are provided for educational and research purposes only. The University of Connecticut Libraries hold the copyright except where noted. Permission must be obtained in writing from the University of Connecticut Libraries and/or theowner(s) of the copyright to publish reproductions or quotations beyond "fair use." | The collection is open and available for research.</rights>
<creator/>
<relation/>
<coverage/>
<language/>
</dc>
</metadata>
Python3 code:
import csv
import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='ctda_set1_uniqueTags.xml')
doc = ET.parse("ctda_set1_uniqueTags.xml")
root = tree.getroot()
oaidc_data = open('ctda_set1_uniqueTags.csv', 'w', encoding='utf-8')
titles = 'dc/title'
subjects = 'dc/subject'
csvwriter = csv.writer(oaidc_data)
oaidc_head = ['Title', 'Subject', 'Description', 'Publisher', 'Date', 'Type', 'Format', 'Handle', 'Accession Number', 'Barcode', 'Identifiers', 'Rights', 'Creator', 'Relation', 'Coverage', 'Language']
count = 0
for member in root.findall('dc'):
if count == 0:
csvwriter.writerow(oaidc_head)
count = count + 1
dcdata = []
titles = member.find('title').text
dcdata.append(titles)
subjects = member.find('subject').text
dcdata.append(subjects)
descriptions = member.find('description').text
dcdata.append(descriptions)
publishers = member.find('publisher').text
dcdata.append(publishers)
dates = member.find('date').text
dcdata.append(dates)
types = member.find('type').text
dcdata.append(types)
formats = member.find('format').text
dcdata.append(formats)
handle = member.find('handle').text
dcdata.append(handle)
accessionNo = member.find('accessionNumber').text
dcdata.append(accessionNo)
barcodes = member.find('barcode').text
dcdata.append(barcodes)
identifiers = member.find('identifier').text
dcdata.append(identifiers)
rt = member.find('rights').text
print(member.find('rights').text)
dcdata.append('rt')
ct = member.find('creator').text
dcdata.append('ct')
rt = member.find('relation').text
dcdata.append('rt')
ce = member.find('coverage').text
dcdata.append('ce')
lang = member.find('language').text
dcdata.append('lang')
csvwriter.writerow(dcdata)
oaidc_data.close()
Everything works as expected except for rt, ce, and lang. What happens is that in the csv, all the data is written with the comma delimiter. For rt, the value is always rt, for ce, ce, lang, lang, etc.
Here's a snippet of the output:
Title,Subject,Description,Publisher,Date,Type,Format,Handle,Accession Number,Barcode,Identifiers,Rights,Creator,Relation,Coverage,Language
"Golden days for boys and girls, 1895-03-16, v. XVI #17",Children's literature--Children's periodicals,"Archives & Special Collections at the Thomas J. Dodd Research Center, University of Connecticut Libraries","James Elverson, 1880-",1895-06-15,Text | periodicals,image/jp2,hdl.handle.net/11134/20002:860074494,,,20002:860074494 | local: 868010272 | local: 997186613502432 | local: 39153019382870,**rt,ct,rt,ce,lang**
Some of the rights statements get very long - perhaps that's the issue. That's why I added the print(member.find('rights')) to see the output. The text is printed just fine. The text just isn't written to the csv. What I'd like is to have the value or text written for these xml tags. Any help would be appreciated.
Thanks.
Jennifer
In the line dcdata.append('rt') there is no need for the quotes. Try dcdata.append(rt). Similarly, there are unnecessary quotes in the ce and lang lines.

Categories