Format String Vertically - python

I am receiving such data with python that comes as a string from firebase database. How can I format it into more readable data for the user?
Received Output
{'date': '07-Oct-2019', 'day': 'Monday', 'driver': 'John '}
Desired OutPut
date : 07-Oct-2019
day : Monday
driver : jop

Simple one line should do
d={'date': '07-Oct-2019', 'day': 'Monday', 'driver': 'John '}
print("\n".join([k+":"+v for k,v in d.items()]))

Related

How to convert format date from MM/DD/YYYY to YYYY-MM-DDT00:00:00.000Z in Python

I used pandas to create a list of dictionaries. The following codes is how I create the list:
sheetwork = client.open('RMA Daily Workload').sheet1
list_of_work = sheetwork.get_all_records()
dfr = pd.DataFrame(list_of_work, columns = ['date' , 'value'])
rnow = dfrnow.to_dict('records')
The following is the output of my list:
rnow =
[{'date': '01/02/2020', 'value': 13},
{'date': '01/03/2020', 'value': 2},
{'date': '01/06/2020', 'value': 5},
...
{'date': '01/07/2020', 'value': 6}]
I want to change the date format from MM/DD/YYYY to YYYY-MM-DDT00:00:00.000Z, so that my data will be compatible with my javascript file where I want to add my data.
I want my list to be shown as:
rnow =
[{'date': '2020-01-02T00:00:00.000Z', 'value': 13},
{'date': '2020-01-03T00:00:00.000Z', 'value': 2},
{'date': '2020-01-06T00:00:00.000Z', 'value': 5},
...
{'date': '2020-01-07T00:00:00.000Z', 'value': 6}]
I tried so many methods but can only convert them into 2020-01-02 00:00:00 but not 2020-01-02T00:00:00.000Z. Please advise what should I do
If you need exact T00:00:00.000Z this string after the time, try to use string format after time conversion,
e.g.,
import datetime
# '2020-01-07T00:00:00.000Z'
datetime.datetime.strptime("07/02/2020", '%d/%m/%Y').strftime('%Y-%m-%dT00:00:00.000Z'))
How to apply to pandas:
def func(x):
myDate = x.date
return datetime.datetime.strptime(myDate, '%d/%m/%Y').strftime('%Y-%m-%dT00:00:00.000Z')
df['new_date'] = df.apply(func, axis=1)
To make it easy and keeping UTC and since you are using pandas:
rnow = [{'date': '01/02/2020', 'value': 13},
{'date': '01/03/2020', 'value': 2},
{'date': '01/06/2020', 'value': 5},
{'date': '01/07/2020', 'value': 6}]
def get_isoformat(date):
return pd.to_datetime(date, dayfirst=False, utc=True).isoformat()
for i in range (len(rnow)):
rnow[i]['date'] = get_isoformat(rnow[i]['date'])
rnow
which outputs:
[{'date': '2020-01-02T00:00:00+00:00', 'value': 13},
{'date': '2020-01-03T00:00:00+00:00', 'value': 2},
{'date': '2020-01-06T00:00:00+00:00', 'value': 5},
{'date': '2020-01-07T00:00:00+00:00', 'value': 6}]
in fact, you probably want to consider using the function get_isoformat() applied to your dataframe for simplicity. Also, if you use utc=None will get rid of the +00:00 part in case you don't want it or need it.
Edit
To get specificly 2020-01-02T00:00:00Z try:
pd.to_datetime(date, dayfirst=False, utc=False).isoformat()+'Z'
You can use the isoformat function of Python's builtin datetime package:
from datetime import datetime, timezone
formatted = datetime.strptime('01/02/2020', '%m/%d/%Y', tzInfo=timezone.utc).isoformat()
formatted
# Output: '2020-01-02T00:00:00+00:00'
Note that Python doesn't support the Z suffix for UTC timezone, instead it will be +00:00 which is according to ISO 8601 as well and should parse in other code just fine.
If this is a problem, you can omit the timezone and instead manually put a Z there:
from datetime import datetime
formatted = datetime.strptime('01/02/2020', '%m/%d/%Y').isoformat() + 'Z'
formatted
# Output: '2020-01-02T00:00:00Z'
Alternatively (in a more "manual" approach), you could format the date using strftime:
from datetime import datetime
formatted = datetime.strptime('01/02/2020', '%m/%d/%Y').strftime('%Y-%m-%dT00:00:00Z')
formatted
# Output: '2020-01-02T00:00:00Z'

Python Regex re.search - groupdict() - Date format matching

I need to get the date month from various strings such as '14th oct', '14oct', '14.10', '14 10' and '14/10'. For these cases my below code working fine.
query = '14.oct'
print(re.search(r'(?P<date>\b\d{1,2})(?:\b|st|nd|rd|th)?(?:[\s\.\-/_\\,]*)(?P<month>\d{1,2}|[a-z]{3,9})', query, re.I).groupdict())
Result:-
{'date': '14', 'month': 'oct'}
But for this case (1410), its still capturing the date and month. But I don't want that, since this will be another number format of that entire string and not to be considered as date and month. The result should be None.
How to change the search pattern for this? (with groupdict() only)
Edited:-
The mentioned parathesis in the number above (1410) is just to differentiate from other text. What I want to mean is 1410 only.
The below solution is what I want and I got the idea from the answer of #the-fourth-bird by adding (?!\d{3,}\b) in the regex pattern.
Thanks🙏🏽
Final Solution
import re
queries = ['14 10', '14.10', '1410', '14-10', '14/10', '14,10', '17800', '14th oct', '14thoct', '14th-oct', '14th/oct', '14-oct', '14.oct', '14oct']
max_indent = len(max(queries, key = len)) + 1
for query in queries:
if resp := re.search(r'(?P<date>\b(?!\d{3,}\b)\d{1,2})(?:\b|st|[nr]d|th)?(?:[\s.-/_\\,-]*)(?P<month>\d{1,2}|[a-z]{3,9})', query, re.I):
print(f"{query:{max_indent}}- {resp.groupdict()}")
else:
print(f"{query:{max_indent}}- 'Not a date'")
Result:-
14 10 - {'date': '14', 'month': '10'}
14.10 - {'date': '14', 'month': '10'}
1410 - 'Not a date'
14-10 - {'date': '14', 'month': '10'}
14/10 - {'date': '14', 'month': '10'}
14,10 - {'date': '14', 'month': '10'}
17800 - 'Not a date'
14th oct - {'date': '14', 'month': 'oct'}
14thoct - {'date': '14', 'month': 'oct'}
14th-oct - {'date': '14', 'month': 'oct'}
14th/oct - {'date': '14', 'month': 'oct'}
14-oct - {'date': '14', 'month': 'oct'}
14.oct - {'date': '14', 'month': 'oct'}
14oct - {'date': '14', 'month': 'oct'}
Not sure if you don't want to match 1410 as in 4 digits only or (1410) with the parenthesis, but to exclude matching both you can make sure there are not 4 consecutive digits:
(?P<date>\b(?!\d{4}\b)\d{1,2})(?:st|[nr]d|th)?[\s./_\\,-]*(?P<month>\d{1,2}|[a-z]{3,9})
Regex demo
To not match any date between parenthesis
\([^()]*\)|(?P<date>\b\d{1,2})(?:st|[nr]d|th)?[\s./_\\,-]*(?P<month>\d{1,2}|[a-z]{3,9})
\([^()]*\) Match from opening till closing parenthesis
| Or
(?P<date>\b\d{1,2}) Match 1-2 digits
(?:st|[nr]d|th)? Optionally match st nd rd th
[\s./_\\,-]* Optionally repeat matching any of the listed
(?P<month>\d{1,2}|[a-z]{3,9}) Match 1-2 digits or 3-9 chars a-z
Regex demo
For example
import re
pattern = r"\([^()]*\)|(?P<date>\b\d{1,2})(?:st|[nr]d|th)?(?:[\s./_\\,-]*)(?P<month>\d{1,2}|[a-z]{3,9})"
strings = ["14th oct", "14oct", "14.10", "14 10", "14/10", "1410", "(1410)"]
for s in strings:
m = re.search(pattern, s, re.I)
if m.group(1):
print(m.groupdict())
else:
print(f"{s} --> Not valid")
Output
{'date': '14', 'month': 'oct'}
{'date': '14', 'month': 'oct'}
{'date': '14', 'month': '10'}
{'date': '14', 'month': '10'}
{'date': '14', 'month': '10'}
{'date': '14', 'month': '10'}
(1410) --> Not valid
How to change the search pattern for this?
You might try using negative lookbehind assertion literal ( combined with negative lookahead assertion literal ) as follows
import re
query = '14.oct'
noquery = '(1410)'
print(re.search(r'(?<!\()(?P<date>\b\d{1,2})(?:\b|st|nd|rd|th)?(?:[\s\.\-/_\\,]*)(?P<month>\d{1,2}|[a-z]{3,9})(?!\))', query, re.I).groupdict())
print(re.search(r'(?<!\()(?P<date>\b\d{1,2})(?:\b|st|nd|rd|th)?(?:[\s\.\-/_\\,]*)(?P<month>\d{1,2}|[a-z]{3,9})(?!\))', noquery, re.I))
output
{'date': '14', 'month': 'oct'}
None
Beware that it does prevent all bracketed forms, i.e. not only (1410) but also (14 10), (14/10) and so on.

Best way to break apart long string using redshift SQL (included in question)?

looking for the best way to break apart this blob of information into columns
DATE
AMOUNT
TYPE
UNDISCLOSED
INVESTORS
INVESTORS WEBSITES
[{'date': 'Mon Aug 07 00:00:00 UTC 2004', 'amount': '1900000', 'type': 'Series D', 'undisclosed': 'false', 'investor': [{'name': 'Jobius Venture', 'website': 'jobiusvc.com'}]}, {'date': 'Tues July 06 00:00:00 UTC 2010', 'amount': '12000000000', 'type': 'Series A1', 'undisclosed': 'false', 'investor': [{'name': 'Fatthead Partners', 'website': 'fpartnazs.com'}, {'name': 'Jobius Venture', 'website': 'jobiusvc.com'}, {'name': 'Pista Pentures ', 'website': 'pisptavc.com'}]}, {'date': 'Sat Jun 01 00:00:00 UTC 2015', 'amount': '10000000000', 'type': 'Series X', 'undisclosed': 'false', 'investor': [{'name': 'Fatthead Partners', 'website': 'fpartnazs.com'}, {'name': 'Jobius Venture', 'website': 'jobiusvc.com'}, {'name': 'Pista Pentures', 'website': 'vistavc.com'}]}, {'date': 'Sun Aug 31 00:00:00 UTC 2015', 'amount': '3913000', 'type': 'Unknown', 'undisclosed': 'false'}, {'date': 'Mon Aug 12 00:00:00 UTC 2023', 'amount': '40000', 'type': 'Series D34', 'undisclosed': 'false', 'investor': [{'name': 'Fatthead Partners', 'website': 'fpartnazs.com'}, {'name': 'Jobius Venture', 'website': 'jobiusvc.com'}]}]
Your output is almost in JSON format.
For JSON, you could use: JSON_EXTRACT_PATH_TEXT Function - Amazon Redshift
However, it seems that the quotation marks are not standard JSON. It should use double-quotes (") in JSON, not single quotes (').
Also, the string appears to start with a List ([...]), which makes it incompatible with the JSON functions. A JSON object would normally be in {..} braces.
The output looks more like it came from a Python program. If so, and you have access to the Python program, it would be better to have it output in correct JSON format, so that you could use the above function. (Or just output the fields you actually want.)
You could write a Python User-Defined Function to do the conversion, such as:
create or replace function f_parse (str varchar(2000))
returns varchar
stable
as $$
return eval(str)[0]['date']
$$ language plpythonu;
Then:
select f_parse(s) from table
Results in: Mon Aug 07 00:00:00 UTC 2004
However, it appears that multiple records are in that one line, so I really suggest that you get a better version of the input data rather than trying to parse that line.

Change dates in list with multiple dictionaries in Python

I have a list with multiple dictionaries, like the following:
[{'Date': '6-1-2017', 'Rate':'0.3', 'Type':'A'},
{'Date': '6-1-2017', 'Rate':'0.4', 'Type':'B'},
{'Date': '6-1-2017', 'Rate':'0.6', 'Type':'F'},
{'Date': '6-1-2017', 'Rate':'0.1', 'Type':'B'}
]
I would now like to change the dates, because they need to be in the format 'yyymmdd', which starts at 1900-01-01. In other words, I would like to change the '6-1-2017' to '1170106'.
As this has to be done every week (with the then current date), I do not want to change this by hand. So next week, '13-1-2017' has to be transformed into '1170113'.
Anyone ideas how to do this? I have tried several things, but I can't even get my code to select the date-values of all dictionaries.
Many thanks!
You can use the datetime module, which provides a lot of functionality to manipulate datetime objects including converting datetime to string and the way back, accessing different components of the datetime object, etc:
from datetime import datetime
for l in lst:
l['Date'] = datetime.strptime(l['Date'], "%d-%m-%Y")
l['Date'] = str(l['Date'].year - 1900) + l['Date'].strftime("%m%d")
lst
#[{'Date': '1170106', 'Rate': '0.3', 'Type': 'A'},
# {'Date': '1170106', 'Rate': '0.4', 'Type': 'B'},
# {'Date': '1170106', 'Rate': '0.6', 'Type': 'F'},
# {'Date': '1170106', 'Rate': '0.1', 'Type': 'B'}]

I am getting this sort of CSV data while making Http request to the CSV file. Very malformed string [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I am getting this sort of CSV data while making Http request to the CSV file. Very malformed string.
response = '"Subject";"Start Date";"Start Time";"End Date";"End Time";"All day event";"Description""Play football";"16/11/2009";"10:00 PM";"16/11/2009";"11:00 PM";"false";"""Watch 2012";"20/11/2009";"07:00 PM";"20/11/2009";"08:00 PM";"false";""'
And i want to convert this into list of dictionary
[{"Subject": "Play football", "Start Date": "16/11/2009", "Start Time": "10:00 PM", "End Date": "16/11/2009", "End Time": "11:00 PM", "All day event", false, "Description": ""},
{"Subject": "Watch 2012", "Start Date": "20/11/2009", "Start Time": "07:00 PM", "End Date": "20/11/2009", "End Time": "08:00 PM", "All day event", false, "Description": ""}]
I tried solving this using python csv module but didn't work.
import csv
from cStringIO import StringIO
>>> str_obj = StringIO(response)
>>> reader = csv.reader(str_obj, delimiter=';')
>>> [x for x in reader]
[['Subject',
'Start Date',
'Start Time',
'End Date',
'End Time',
'All day event',
'Description"Play football',
'16/11/2009',
'10:00 PM',
'16/11/2009',
'11:00 PM',
'false',
'"Watch 2012',
'20/11/2009',
'07:00 PM',
'20/11/2009',
'08:00 PM',
'false',
'']]
I get the above result.
Any sort of help will be appreciated. Thanks in advance.
Here's a pyparsing solution:
from pyparsing import QuotedString, Group, delimitedList, OneOrMore
# a row of headings or data is a list of quoted strings, delimited by ';'s
qs = QuotedString('"')
datarow = Group(delimitedList(qs, ';'))
# an entire data set is a single data row containing the headings, followed by
# one or more data rows containing the data
dataset_parser = datarow("headings") + OneOrMore(datarow)("rows")
# parse the returned response
data = dataset_parser.parseString(response)
# create dict by zipping headings with each row's data values
datadict = [dict(zip(data.headings, row)) for row in data.rows]
print datadict
Prints:
[{'End Date': '16/11/2009', 'Description': '', 'All day event': 'false',
'Start Time': '10:00 PM', 'End Time': '11:00 PM', 'Start Date': '16/11/2009',
'Subject': 'Play football'},
{'End Date': '20/11/2009', 'Description': '', 'All day event': 'false',
'Start Time': '07:00 PM', 'End Time': '08:00 PM', 'Start Date': '20/11/2009',
'Subject': 'Watch 2012'}]
This will also handle the case if the quoted strings contain embedded semicolons.
Here's one approach.
I notice there is no delimiter between rows. In an effort to clean up the input data, I make a few assumptions:
The first "row" is the "heading" of a "table", these will be our dictionary keys
There are no empty fields in the first row (ie: no "")
Any other field can be empty (ie: "")
The first occurrence of two successive " indicates the end of the heading row
First I create a response based on your input string:
>>> response = '"Subject";"Start Date";"Start Time";"End Date";"End Time";"All day event";"Description""Play football";"16/11/2009";"10:00 PM";"16/11/2009";"11:00 PM";"false";"""Watch 2012";"20/11/2009";"07:00 PM";"";"08:00 PM";"false";"""";"17/11/2009";"9:00 AM";"17/11/2009";"10:00 AM";"false";""'
Note that
the "End Date" for "Watch 2012" is empty
there is a third event with an empty "Subject" heading
These two modifications illustrate some "edge cases" I'm concerned about.
First I will replace all occurrences of two consecutive " with a pipe (|) and strip out all other " characters because I don't need them:
>>> response.replace('""', '|').replace('"', '')
'Subject;Start Date;Start Time;End Date;End Time;All day event;Description|Play football;16/11/2009;10:00 PM;16/11/2009;11:00 PM;false;|Watch 2012;20/11/2009;07:00 PM;|;08:00 PM;false;||;17/11/2009;9:00 AM;17/11/2009;10:00 AM;false;|'
If we had any empty cells not at the start or end of a row (ie: Watch 2012's End Date), it looks like this: ;|; -- let's simply leave it blank:
>>> response.replace('""', '|').replace('"', '').replace(';|;', ';;')
'Subject;Start Date;Start Time;End Date;End Time;All day event;Description|Play football;16/11/2009;10:00 PM;16/11/2009;11:00 PM;false;|Watch 2012;20/11/2009;07:00 PM;;08:00 PM;false;||;17/11/2009;9:00 AM;17/11/2009;10:00 AM;false;|'
Now the | indicates the split between the heading row and the next row. What happens if we split our string on |?
>>> response.replace('""', '|').replace('"', '').replace(';|;', ';;').split('|')
['Subject;Start Date;Start Time;End Date;End Time;All day event;Description',
'Play football;16/11/2009;10:00 PM;16/11/2009;11:00 PM;false;',
'Watch 2012;20/11/2009;07:00 PM;;08:00 PM;false;',
'',
';17/11/2009;9:00 AM;17/11/2009;10:00 AM;false;',
'']
Looks like we're getting somewhere. There's a problem, though; there are two items in that list that are just the empty string ''. They're there because we sometimes have a | at the end of a row and the beginning of the next row, and splitting creates an empty element:
>>> "a|b||c".split('|')
['a', 'b', '', 'c']
Same goes for a lone delimited at the end of a line, too:
>>> "a||b|c|".split('|')
['a', '', 'b', 'c', '']
Let's filter our list to drop those empty "rows":
>>> rows = [row for row in response.replace('""', '|').replace('"', '').replace(';|;', ';;').split('|') if row]
>>> rows
['Subject;Start Date;Start Time;End Date;End Time;All day event;Description',
'Play football;16/11/2009;10:00 PM;16/11/2009;11:00 PM;false;',
'Watch 2012;20/11/2009;07:00 PM;;08:00 PM;false;',
';17/11/2009;9:00 AM;17/11/2009;10:00 AM;false;']
That's it for massaging the input; now we just need to build the dictionary. First, let's get the dictionary keys:
>>> dict_keys = rows[0].split(';')
>>> dict_keys
['Subject',
'Start Date',
'Start Time',
'End Date',
'End Time',
'All day event',
'Description']
And build a list of dictionaries, one for each event:
>>> import itertools
>>> events = []
>>> for row in rows[1:]:
... d = {}
... for k, v in itertools.izip(dict_keys, row.split(';')):
... d[k] = v
... events.append(d)
...
>>> events
[{'All day event': 'false',
'Description': '',
'End Date': '16/11/2009',
'End Time': '11:00 PM',
'Start Date': '16/11/2009',
'Start Time': '10:00 PM',
'Subject': 'Play football'},
{'All day event': 'false',
'Description': '',
'End Date': '',
'End Time': '08:00 PM',
'Start Date': '20/11/2009',
'Start Time': '07:00 PM',
'Subject': 'Watch 2012'},
{'All day event': 'false',
'Description': '',
'End Date': '17/11/2009',
'End Time': '10:00 AM',
'Start Date': '17/11/2009',
'Start Time': '9:00 AM',
'Subject': ''}]
Hope that helps!
Some notes:
if you expect | to appear in your data, you might want to encode it first; or use a different delimiter
supporting quotes in the data might be tricky (ie: 'Subject': 'Watching "2012"')
I leave conversion of 'All day event' values from string to boolean as an exercise to the reader :D
Are you sure, you got this response.
Looks corrupted to me. In this case, no reader will be able to make sense of it.
First fix the response, then parsing will be better ....
response = response.split(';') # split it into words
response = [w[1:-1] for w in response] # strip off the quotes
response = [w.replace('""','"\n"') for w in response] # add in the newlines
response = ['"%s"'%w for w in response] # add the quotes back
response = ';'.join(response)
But it won't work if you have a ";" character in the data that should have been escaped. You should find what happened to the missing newlines in the first place.

Categories