I have a text file in this format:
subscriber=admin lname="adamec22a" password="kofola1224" first-name="Anton net na M.lehote,zapajal si to sam!!" last-name="Adamec 1.3.2012 skoncil zmluvu" phone="00421917499086" location="NB, Sturova 18, 2pos." rate-limit=" 1M/3M" last-seen=never
What I need to do in Python is that each record in the line should be separated by a semicolon and if there is no record (like first-name, or some other), the script should leave there a blank space between two semicolons.
Assuming that the input lines are consistently formatted, and that I understand what you're asking, you can recover the data in the way indicated here. Then you can output it in any way that suits you.
>>> pieces = '''subscriber=admin lname="adamec22a" password="kofola1224" first-name="Anton net na M.lehote,zapajal si to sam!!" last-name="Adamec 1.3.2012 skoncil zmluvu" phone="00421917499086" location="NB, Sturova 18, 2pos." rate-limit=" 1M/3M" last-seen=never'''.split('=')
>>> fieldNames = [ pieces[0] ]
>>> for i in range(1, -1+len(pieces)):
... fieldNames.append(pieces[i][1+pieces[i].rfind(' '):])
...
>>> fieldNames
['subscriber', 'lname', 'password', 'first-name', 'last-name', 'phone', 'location', 'rate-limit', 'last-seen']
>>> fieldValues = [ pieces[-1]]
>>> for i in range(-2+len(pieces),0,-1):
... fieldValues.append(pieces[i][:pieces[i].rfind(' ')])
...
>>> fieldValues.reverse()
>>> fieldValues
['admin', '"adamec22a"', '"kofola1224"', '"Anton net na M.lehote,zapajal si to sam!!"', '"Adamec 1.3.2012 skoncil zmluvu"', '"00421917499086"', '"NB, Sturova 18, 2pos."', '" 1M/3M"', 'never']
>>> for fieldName, fieldValue in zip(fieldNames, fieldValues):
... fieldName, fieldValue
...
('subscriber', 'admin')
('lname', '"adamec22a"')
('password', '"kofola1224"')
('first-name', '"Anton net na M.lehote,zapajal si to sam!!"')
('last-name', '"Adamec 1.3.2012 skoncil zmluvu"')
('phone', '"00421917499086"')
('location', '"NB, Sturova 18, 2pos."')
('rate-limit', '" 1M/3M"')
('last-seen', 'never')
Related
I have a sample array ['first_name', 'last_name'] as input and would like to have the output as "first_name", "last_name" without any square brackets but need to have the double quotes around the elements. I have tried below but doesn't seem to work. appreciate any inputs on this.
The array is dynamic. Can have any number of elements. The elements need to be enclosed in double quotes each and no square brackets.
array_list = ['first_name', 'last_name']
string_list = list(array_list)
print(string_list)
array_list = ['first_name', 'last_name']
for i in array_list:
print(f' "{i}" ',end=" ".join(","))
You can add the intended quotation marks, you can do so with f-string
string_list = [f'"{item}"' for item in array_list]
print(", ".join(string_list))
array_list = ['first_name', 'last_name']
print(', '.join(f'"{e}"' for e in array_list))
Output:
"first_name", "last_name"
array_list = ['first_name', 'last_name']
pre_processed = [f'"{item}"' for item in array_list]
string_list = ", ".join(pre_processed)
print(string_list)
Output:
"first_name", "last_name"
you can do like this using list-string conversion...
Code
array_list = str(['first_name', 'last_name',5]).strip('[]')
print(array_list)
#-------OR--------
array_list = ['first_name', 'last_name'] # only string handle
print(",".join(array_list))
output
'first_name', 'last_name', 5
you can try below to achieve the same.It has for loop to iterate through the array and convert it to a string with double quotes around each element: #Pal1989
array_list = ['first_name', 'last_name']
string_list = ""
for element in array_list:
string_list += '"' + element + '", '
string_list = string_list[:-2]
print(string_list)
All you really need to do is to join by the separator and put double quotes at front and back:
array_list = ['first_name', 'last_name']
print('"' + '", "'.join(array_list) + '"')
output: "first_name", "last_name"
Remember: when you need to put double quotes in strings, surround with singles: ' " ' - I've left blanks on purpose. And " ' " to have single quotes.
I have written out my code and when I run it, I get a KeyError:
Traceback (most recent call last):
File "C:/Users/sagar/Desktop/Sagar CS131B Files/convert_to_fixed.py", line 21, in <module>
birthdate = sample['Birthdate']
KeyError: 'Birthdate'
my code:
inputFile = 'raw.data.py'
data = list()
columns = ['First name','Last name','Telephone','Address','City','State','Birthdate']
for line in open(inputFile):
# Assuming comments in the text file as '#'
if line.startswith('#'): continue
row = line.strip().split(':')
data.append(dict(zip(columns, row)))
#print(data)
formatted_data = list()
for sample in data:
birthdate = sample['Birthdate']
mm,dd,yy = birthdate.split('/')
if len(yy)==2:
yy = '19' + yy
birthdate = '/'.join([mm,dd,yy])
sample['Birthdate'] = birthdate
modified_row = ':'.join(
[sample['Last name'], sample['First name'],
sample['Telephone'], sample['Address'],
sample['City'], sample['State'], sample['Birthdate']])
formatted_data.append(modified_row + '\n')
with open('fixed.data','w') as f:
f.writelines(formatted_data)
I have looked up how to fix it, just not sure on the execution of a try-except function. If someone could help me out with this that would be amazing..
This is what is inside the file given:
'Betty:Boop:245-836-8357:635 Cutesy Lane:Hollywood:CA:6/23/1923',
'Ephram:Hardy:293-259-5395:235 Carlton Lane:Joliet:IL:8/12/1920',
'Fred:Fardbarkle:674-843-1385:20 Parak Lane:DeLuth:MN:4/12/23',
'Igor:Chevsky:385-375-8395:3567 Populus Place:Caldwell:NJ:6/18/68',
'James:Ikeda:834-938-8376:23445 Aster Ave.:Allentown:NJ:12/1/1938',
'Jennifer:Cowan:548-834-2348:408 Laurel Ave.:Kingsville:TX:10/1/35',
'Jesse:Neal:408-233-8971:45 Rose Terrace:San Francisco:CA:2/3/2001',
'Jon:DeLoach:408-253-3122:123 Park St.:San Jose:CA:7/25/53',
'Jose:Santiago:385-898-8357:38 Fife Way:Abilene:TX:1/5/58',
'Karen:Evich:284-758-2867:23 Edgecliff Place:Lincoln:NB:11/3/35',
'Lesley:Kirstin:408-456-1234:4 Harvard Square:Boston:MA:4/22/2001',
'Lori:Gortz:327-832-5728:3465 Mirlo Street:Peabody:MA:10/2/65',
'Norma:Corder:397-857-2735:74 Pine Street:Dearborn:MI:3/28/45',
'Paco:Gutierrez:835-365-1284:454 Easy Street:Decatur:IL:2/28/53',
'Popeye:Sailor:156-454-3322:945 Bluto Street:Anywhere:USA:3/19/35',
'Sir:Lancelot:837-835-8257:474 Camelot Boulevard:Bath:WY:5/13/69',
'Steve:Blenheim:238-923-7366:95 Latham Lane:Easton:PA:11/12/1956',
'Tommy:Savage:408-724-0140:1222 Oxbow Court:Sunnyvale:CA:5/19/66',
'Vinh:Tranh:438-910-7449:8235 Maple Street:Wilmington:VM:9/23/63',
'William:Kopf:846-836-2837:6937 Ware Road:Milton:PA:9/21/46',
'Yukio:Takeshida:387-827-1095:13 Uno Lane:Ashville:NC:7/1/29',
'Zippy:Pinhead:834-823-8319:2356 Bizarro Ave.:Farmount:IL:1/1/67',
'Andy:Warhol:212-321-7654:231 East 47th Street:New York City:NY:8/6/1928'
zip() only produces results up to the shorter iterables length:
print(list(zip([1,2],[1,2,3,4,5,6]))) # [(1, 1), (2, 2)]
Your source data somehow at least one line with less elements in it that is why one of your dicts does not have the 'Birthdate' key (the last one).
You can guard against it:
data = list()
columns = ['First name', 'Last name', 'Telephone',
'Address', 'City', 'State', 'Birthdate']
# use a context manager for file open
with open(inputFile) as f:
for line in f:
# Assuming comments in the text file as '#'
if line.startswith('#'):
continue
# ignore empty lines (you can combine with above)
if not line.strip():
continue
row = line.strip().split(':')
# raise exception if not enough data found
if len(row) != len(columns):
raise AttributeError("Not enough datapoints in line: ", line)
data.append(dict(zip(columns, row)))
I have a long dictionary which looks like this:
name = 'Barack.'
name_last = 'Obama!'
street_name = "President Streeet?"
list_of_slot_names = {'name':name, 'name_last':name_last, 'street_name':street_name}
I want to remove the punctation for every slot (name, name_last,...).
I could do it this way:
name = name.translate(str.maketrans('', '', string.punctuation))
name_last = name_last.translate(str.maketrans('', '', string.punctuation))
street_name = street_name.translate(str.maketrans('', '', string.punctuation))
Do you know a shorter (more compact) way to write this?
Result:
>>> print(name, name_last, street_name)
>>> Barack Obama President Streeet
Use a loop / dictionary comprehension
{k: v.translate(str.maketrans('', '', string.punctuation)) for k, v in list_of_slot_names.items()}
You can either assign this back to list_of_slot_names if you want to overwrite existing values or assign to a new variable
You can also then print via
print(*list_of_slot_names.values())
name = 'Barack.'
name_last = 'Obama!'
empty_slot = None
street_name = "President Streeet?"
print([str_.strip('.?!') for str_ in (name, name_last, empty_slot, street_name) if str_ is not None])
-> Barack Obama President Streeet
Unless you also want to remove them from the middle. Then do this
import re
name = 'Barack.'
name_last = 'Obama!'
empty_slot = None
street_name = "President Streeet?"
print([re.sub('[.?!]+',"",str_) for str_ in (name, name_last, empty_slot, street_name) if str_ is not None])
import re, string
s = 'hell:o? wor!d.'
clean = re.sub(rf"[{string.punctuation}]", "", s)
print(clean)
output
hello world
I'm using conda 4.5.11 and python 3.6.3 to read a dynamic list, such as this:
[['Results:',
'2',
'Time:',
'16',
'Register #1',
'Field1:',
'999999999999999',
'Field2:',
'name',
'Field3:',
'some text',
'Field4:',
'number',
'Fieldn:',
'other number',
'Register #2',
'Field1:',
'999999999999999',
'Field2:',
'name',
'Field3:',
'type',
'Field4:',
'some text'
'FieldN:',
'some text',
'Register #N',
...
]]
Here is the code for my best try:
data = []
header = []
data_text = []
for data in res:
part = data.split(":")
header_text = part[1]
data_t = part[2]
header.append(header_text)
data_text.append(data_t)
df_data = pd.DataFrame(data_text)
df_header = pd.DataFrame(header)
Output
Field1 Field2 Field3 Field4 Fieldn1 Fieldn2 Fieldn
999999999999999 name sometext number number text number
999999999999999 name sometext number number number NAN
999999999999999 name number NAN number text number
Is it possible to read from a list and concat in one DataFrame?
; commentary
[owner]
name=Justin Case
organization=Chilling Inc.
[database]
; more commentary
server=192.0.0.1
port=123
file=something.csv
[third section]
attribute=value,
that extends to
the third line,
but not the fourth
Given the above ini contents, have to construct a dictionary such that
{'owner' : {'name' : 'Justin Case','organization' : 'Chilling Inc.'},
'database' : {'server' : '192.0.0.1', 'port' : '123', 'file' : 'something.csv'},
'third section' : {'attribute' : 'multiline value'}}
I realize there is the configuration file parser, but not allowed for this assignment.
Progress at the moment:
with open('ini.txt', encoding='utf8') as data:
lines = [row for row in data]
lines_nocom = []
for row in lines:
if not row.startswith(';'):
lines_nocom.append(row)
dictt = {}
I removed the rows with commentary in them since they are unnecesary.
How can I make python recognize the sections and their respective attributes?
i.e section1 could have 2 attributes and section2 could have any number of attributes
If I do [row for row in lines_nocom] then how does it recognize where one section ends and another begins?
How to make python recognize a multiline value?
Track the current section and add your keys to that; each time you find a line using square brackets create a new section.
For continuation lines, do something similar; track the last used name:
with open('ini.txt', encoding='utf8') as data:
section = None # current section
name = None # current name being stored
result = {}
for line in data:
line = line.strip()
if not line or line.startswith(';'):
# skip comments and empty lines
continue
if line.startswith('[') and line.endswith(']'):
# new section
section_name = line.strip('[]')
section = result[section_name] = {}
continue
# add entries to the existing section
if '=' in line:
name, _, value = line.partition('=')
name = name.strip()
section[name] = value.strip()
else:
# adding to last-used name
section[name] += ' ' + line
Demo:
>>> from io import StringIO
>>> from pprint import pprint
>>> sample = StringIO('''\
... ; commentary
... [owner]
... name=Justin Case
... organization=Chilling Inc.
...
... [database]
... ; more commentary
... server=192.0.0.1
... port=123
... file=something.csv
...
... [third section]
... attribute=value,
... that extends to
... the third line,
... but not the fourth
... ''')
>>> section = None # current section
>>> name = None # current name being stored
>>> result = {}
>>> for line in sample:
... line = line.strip()
... if not line or line.startswith(';'):
... # skip comments and empty lines
... continue
... if line.startswith('[') and line.endswith(']'):
... # new section
... section_name = line.strip('[]')
... section = result[section_name] = {}
... continue
... # add entries to the existing section
... if '=' in line:
... name, _, value = line.partition('=')
... name = name.strip()
... section[name] = value.strip()
... else:
... # adding to last-used name
... section[name] += ' ' + line
...
>>> pprint(result)
{'database': {'file': 'something.csv', 'port': '123', 'server': '192.0.0.1'},
'owner': {'name': 'Justin Case', 'organization': 'Chilling Inc.'},
'third section': {'attribute': 'value, that extends to the third line, but '
'not the fourth'}}