I've massaged the data into the list structure…
[['keychain: "keychainname.keychain-db"',
'version: 512',
'class: 0x0000000F ',
'attributes:\n long string containing : and \n that needs to be split up into a list (by newline) of dictionaries (by regex and/or split() function) '],
['keychain: "keychainname.keychain-db"',
'version: 512',
'class: 0x0000000F ',
'attributes:\n long string that needs to be split up '],
['keychain: "keychainname.keychain-db"',
'version: 512',
'class: 0x0000000F ',
'attributes:\n long string that needs to be split up']]
I'm trying to use a comprehension to take each item in the list and split it into a dictionary with the format…
['{'keychain': 'keychainname.db',
'version': '512',
'class': '0x0000000F',
'attribute':\n long string containing : and \n that needs to be split up into a dictionary (by newline) of dictionaries (by regex and/or split() function) '}']
The following for loop seems to work…
newdata = []
for item in data:
eachdict = {}
for each in item:
new = each.split(':', 1)
eachdict[new[0]] = new[1]
newdata.append(eachdict)
But my attempt at a comprehension does not…
newdata = [[{key:value for item in data} for line in item] for key, value in (line.split(':', 1))]
This comprehension runs, but it doesn't have the nesting done correctly…
newdata = [{key:value for item in data} for key, value in (item.split(':', 1),)]
I've just started learning comprehensions and I've been able to use them successfully to get the data into the above format of a nested list, but I'm struggling to understand the nesting when I'm going down three levels and switching from list to dictionary.
I'd appreciate some pointers on how to tackle the problem.
For bonus points, I'll need to split the long string inside the attributes key into a dictionary of dictionaries as well. I'd like to be able to reference the 'Alis' key, the 'labl' key and so on. I can probably figure that out on my own if I learn how to use nested comprehensions in the example above first.
attributes:\n
"alis"<blob>="com.company.companyvpn.production.vpn.5D5AF9C525C25350E9CD5CEBED824BFD60E42110"\n
"cenc"<uint32>=0x00000003 \n
"ctyp"<uint32>=0x00000001 \n
"hpky"<blob>=0xB7262C7D5BCC976744F8CA6DE5A80B755622D434 "\\267&,}[\\314\\227gD\\370\\312m\\345\\250\\013uV"\\3244"\n
"issu"<blob>=0x306E3128302606035504030C1F4170706C6520436F72706F726174652056504E20436C69656E7420434120313120301E060355040B0C1743657274696669636174696F6E20417574686F7269747931133011060355040A0C0A4170706C6520496E632E310B3009060355040613025553 "0n1(0&\\006\\003U\\004\\003\\014\\037Company Corporate VPN Client CA 11 0\\036\\006\\003U\\004\\013\\014\\027Certification Authority1\\0230\\021\\006\\003U\\004\\012\\014\\012Company Inc.1\\0130\\011\\006\\003U\\004\\006\\023\\002US"\n
"labl"<blob>="com.company.companyvpn.production.vpn.5D5AF9C525C25350E9CD5CEBED824BFD60E42110"\n
"skid"<blob>=0xB7262C7D5BCC976744F8CA6DE5A80B755622D434 "\\267&,}[\\314\\227gD\\370\\312m\\345\\250\\013uV"\\3244"\n "snbr"<blob>=0x060A02F6F9880D69 "\\006\\012\\002\\366\\371\\210\\015i"\n
"subj"<blob>=0x3061315F305D06035504030C56636F6D2E6170706C652E6973742E64732E6170706C65636F6E6E656374322E70726F64756374696F6E2E76706E2E35443541463943353235433235333530453943443543454245443832344246443630453432313130 "0a1_0]\\006\\003U\\004\\003\\014Vcom.company.companyvpn.production.vpn.5D5AF9C525C25350E9CD5CEBED824BFD60E42110"'
For context…
I'm using the output of "security dump-keychain" on the Mac to make a nice Python data structure to find keys. The check_output of this command is a long string with some inconsistent formatting and embedded newlines that I need to clean up to get the data into a list of dictionaries of dictionaries.
For those interested in Mac admin topics, this is so we can remove items that save the Active Directory password when the AD password is reset so that the account doesn't get locked by, say, Outlook presenting the old password to Exchange over and over.
Here might be an approach:
data = [['keychain: "keychainname.keychain-db"', 'version: 512', 'class: 0x0000000F ', 'attributes:\n long string containing : and \n that needs to be split up into a list (by newline) of dictionaries (by regex and/or split() function) '], ['keychain: "keychainname.keychain-db"', 'version: 512', 'class: 0x0000000F ', 'attributes:\n long string that needs to be split up '], ['keychain: "keychainname.keychain-db"', 'version: 512', 'class: 0x0000000F ', 'attributes:\n long string that needs to be split up']]
result = [dict([item.split(':', 1) for item in items]) for items in data]
>>> [{'keychain': ' "keychainname.keychain-db"', 'version': ' 512', 'class': ' 0x0000000F ', 'attributes': '\n long string containing : and \n that needs to be split up into a list (by newline) of dictionaries (by regex and/or split() function) '}, {'keychain': ' "keychainname.keychain-db"', 'version': ' 512', 'class': ' 0x0000000F ', 'attributes': '\n long string that needs to be split up '}, {'keychain': ' "keychainname.keychain-db"', 'version': ' 512', 'class': ' 0x0000000F ', 'attributes': '\n long string that needs to be split up'}]
The split breaks up each individual string into a key, value pair.
The inside list comprehension loops through each key, value pair in an individual item. The outside list comprehension loops through each item in the main list.
Related
[ '[{ growth=FLOW, label=BOW} ]',
'[{ growth=FLOW1, label=BOW1}, {growth=MID1, label= pow1} ]',
'[{growth=FLOW2, label=BOW2}, {growth=MID1, label= pow1} ]' ]
How I can remove the string format and make it actual lists of the list.
Any suggestions are appreciated.
eg: Data Sample
date cord bound_box
0 2020-08-07T02:40:25 14.1561926,121.2731238 '[{width=0.05, x=0.46879336, growth_state=EARLY, y=0.44942817, label=CLERT, height=0.05}]'
1 2020-07-22T23:36:35 37.2349683,-80.4365232 '[{width=0.05, x=0.43004116, growth_state=BRANCHING, y=0.48765433, label=CIRVU, height=0.05}]'
2 2020-08-25T01:17:35 14.1737223,121.2563773 '[{width=0.05, x=0.43387097, growth_state=MID, y=0.37651333, label=MIMPU, height=0.05}]'
3 2020-04-27T18:04:10 53.0833487,-2.0382104 '[{width=0.05, x=0.31318682, growth_state=MID, y=0.52674896, label=GAETE, height=0.05}, {width=0.05, x=0.7967033, growth_state=EARLY, y=0.7105624, label=GAETE, height=0.05}, {width=0.05, x=0.35897437, growth_state=MID, y=0.3058985, label=GAETE, height=0.05}]'
also there is a slight mistake in answer #pts. Could you please correct it, as you have merged the sublist too.
current format in third column ==> [ '[]', '[{}, {}]', '[{}, {}, {}]',... ]
Desired Format ==> [ [], [{}, {}], [{}, {}, {}],... ]
Data sample (as requested by #pts) to give some context to the reader why we need to do such data preprocessing? The main aim is to make each list and its inside dictionary to its normal form so that, finally, we can flatten the third column (bound_box) per each list. I hope #pts you will get some idea why I was trying to do so. Also apart from using regular expression is there any simple way to process lists of the third column, so that noob like me could understand and thank you for your answer.
Something like this works in Python:
import re
data = r'''
[ '[{ growth=FLOW, label=BOW} ]'
'[{growth=FLOW2, label=BOW2}, {growth=MID1, label= pow1} ]' ]
'''
data = re.sub(r'[^A-Za-z{}=]+', ' ', data)
data = re.sub(r'= *', '=', data)
data = re.sub(r'} *{', '\n', data)
data = re.sub(r'[{} ,]+', ' ', data)
data = re.sub(r' *\n *', '\n', data).strip(' \n')
data = re.sub(r'(\S+)=(\S+)', r"'\1': '\2',", data)
data = '[%s]' % ', '.join(
'{%s}' % line.rstrip(',') for line in data.split('\n'))
print(data)
Output:
[{'growth': 'FLOW', 'label': 'BOW'}, {'growth': 'FLOW', 'label': 'BOW'}, {'growth': 'MID', 'label': 'pow'}, {'growth': 'FLOW', 'label': 'BOW'}, {'growth': 'MID', 'label': 'pow'}]
It uses regular expression substitutions to transform the string in multiple steps. To see how it works in more detail, you may want to add print(data); print('') lines between the data = ... lines.
Detailed explanation of this line follows:
data = re.sub(r'(\S+)=(\S+)', r"'\1': '\2',", data)
In the line above the regular expression is (\S+)=(\S+). \S matches a non-whitespace character (e.g. g), \S+ matches one or more non-whitespace characters (e.g. growth), = matches itself, and the parentheses make the matched characters available as \1 and \2 in the replacement. The replacement is '\1': '\2', and \1 will be substituted with the characters matched by the first \S+, and \2 will be substituted with the characters matched by the second \S+. The re.sub call replaces all occurrences from left to right, non-overlapping. An example: it replaces growth=FLOW with 'growth': 'FLOW',.
I need to Update a List in Python which is:
data = [{' Customers ','null,blank '},{' CustomersName ','max=50,null,blank '},{' CustomersAddress ','max=150,blank '},{' CustomersActive ','Active '}]
I wanted to Write a Lambda Expression to Store the Customers, CustomersName in the List and Remove the White Spaces.
I am absolutely New to Python and Does not Have Any Knowledge!
As I see it, You have Declared the Dictionary Inside a List but the Dict is Wrong, It should be {"key":"value"}, So I assume you need to Change it to List as such:
data = [[' Customers ','null,blank '],[' CustomersName ','max=50,null,blank '],[' CustomersAddress ','max=150,blank '],[' CustomersActive ','Active ']]
And Then The Following would get you Your Desired!
data_NameExtracted = [x[0].strip() for x in data]
You can not put this inside a lambda expression, but you can use a generator object like this:
# please note that i have used tuples instead of sets,
# because sets are unordered
data = [
(' Customers ','null,blank '),
(' CustomersName ','max=50,null,blank '),
(' CustomersAddress ','max=150,blank '),
(' CustomersActive ','Active ')
]
# Indexing is not allowed for set objects
values = [item[0].strip() for item in data]
see:
https://wiki.python.org/moin/Generators
https://docs.python.org/3/tutorial/datastructures.html#sets
EDIT:
If you wan't to use dictionaries you could use something like this:
data = [
{' Customers ': 'null,blank '},
{' CustomersName ': 'max=50,null,blank '},
{' CustomersAddress ': 'max=150,blank '},
{' CustomersActive ': 'Active '}
]
# expecting a single value in the dicts
values = [item.values()[0].strip() for item in data]
I wrote a spider and it returns me data which is littered with spaces and newline characters. The newline characters also caused extract() method to return as a list. How do I filter these before it touch the selector? Filtering these after extract() is called breaks the DRY principle as there are a lot of data from a page I need to extract that is attributeless which makes the only way to parse it is through indexing.
How do I filter these?
Source
it returns bad data like this
{ 'aired': ['\n ', '\n Apr 3, 2016 to Jun 26, 2016\n '],
'broadcast': [], 'duration': ['\n ', '\n 24 min. per ep.\n '], 'episodes': ['\n ', '\n 13\n '], 'favourites': ['\n ', '\n 22,673\n'], 'genres': ['Action', 'Comedy', 'School', 'Shounen', 'Super Power'], 'image_url': ['https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg',
'https://myanimelist.cdn-dena.com/images/anime/10/78745.jpg'], 'licensors': ['Funimation'], 'members': ['\n ', '\n 818,644\n'], 'popularity': ['\n ', '\n #21\n'], 'premiered': ['Spring 2016'], 'producers': ['Dentsu',
'Mainichi Broadcasting System',
'Movic',
'TOHO animation',
'Shueisha'], 'ranked': ['\n ', '\n #135', '\n ', '\n'], 'rating': ['\n ', '\n PG-13 - Teens 13 or older\n '], 'score': ['8.44'], 'source': ['\n ', '\n Manga\n '], 'status': ['\n ', '\n Finished Airing\n '], 'studios': ['Bones'], 'title': 'Boku no Hero Academia', 'type': ['TV']}
Edit: The link to source code is different from the time of posting, to see the code back then take a look at commit faae4aff1f998f5589fab1616d21c7afc69e03eb
Looking at your code, you could try using xpaths normalize-space
mal_item['aired'] = border_class.xpath('normalize-space(.//div[11]/text())').extract()
*untested, but seems legit.
For a more general answer, yourString.strip('someChar') or yourString.replace('this','withThis') works well (but in the case of operating with json objects it might not be as efficient as other approaches). If those characters are present in the original data, you need to manually remove them or skip them.
The newline characters also caused extract() method to return as a list
It is not the line breaks that is a cause of such behavior but the way nodes appear in the document tree. Text nodes which are separated by element nodes such as for example <a>, <br>, <hr> are seen as separate entities and scrappy will yield them as such (in fact extract() is supposed to always return a list, even when only a single node was selected). XPath has several basic value processing / filtering functions but it's very limited.
Filtering these after extract() is called breaks the DRY principle
You seem convinced that the only correct way to filter these outputs is by doing it within the selector expression. But it's no use to be so stringent about the principles, you are selecting text nodes from inside your target nodes, and these are bound to have excessive whitespace or be scattered all around their containers. XPath filtering by content is very sluggish and as such it should be done outside of it. Post processing scraped fields is a common practice. You might want to read about scrapy loaders and processors.
Otherwise the simplest way is:
# import re
...
def join_clean(texts):
return re.sub(r'\s+', ' ', ' '.join(texts)).strip()
...
mal_item['type'] = join_clean(border_class.xpath('.//div[8]/a/text()').extract())
Just a new user of scrapy.org and a newbie to Python. I have this values at brand and title properties (JAVA OOP Term) that contains tab spaces and new line. How can we trim it to make this 2 following object properties to have this plain string value
item['brand'] = "KORAL ACTIVEWEAR"
item['title'] = "Boom Leggings"
Below is the data structure
{'store_id': 870, 'sale_price_low': [], 'brand': [u'\n KORAL ACTIVEWEAR\n '], 'currency': 'AUD', 'retail_price': [u'$140.00'], 'category': [u'Activewear'], 'title': [u'\n Boom Leggings\n '], 'url': [u'/boom-leggings-koral-activewear/vp/v=1/1524019474.htm?folderID=13331&fm=other-shopbysize-viewall&os=false&colorId=68136'], 'sale_price_high': [], 'image_url': [u' https://images-na.sample-store.com/images/G/01/samplestore/p/prod/products/kacti/kacti3025868136/kacti3025868136_q1_2-0._SH20_QL90_UY365_.jpg\n'], 'category_link': 'https://www.samplestore.com/clothing-activewear/br/v=1/13331.htm?baseIndex=500', 'store': 'SampleStore'}
I was able to trim the prices to only get the number and decimal by using regex search method, which I think might be wrong when there is a price comma separator.
price = re.compile('[0-9\.]+')
item['retail_price'] = filter(price.search, item['retail_price'])
It looks like all you need to do, at least for this example, is strip all whitespace off the edges of the brand and title values. You don't need a regex for that, just call the strip method.
However, your brand isn't a single string; it's a list of strings (even if there's only one string in the list). So, if you try to just strip it, or run a regex on it, you're going to get an AttributeError or TypeError from trying to treat that list as a string.
To fix this, you need to map the strip over all of the strings, with either the map function or a list comprehension:
item['brand'] = [brand.strip() for brand in item['brand']]
item['title'] = map(str.strip, item['title'])
… whichever of the two is easier for you to understand.
If you have other examples that have embedded runs of whitespace, and you want to turn every such run into exactly one space character, you need to use the sub method with your regex:
item['brand'] = [re.sub(ur'\s+', u' ', brand.strip() for brand in item['brand']]
Notice the u prefixes. In Python 2, you need a u prefix to make a unicode literal instead of a str (encoded bytes) literal. And it's important to use Unicode patterns against Unicode strings, even if the pattern itself doesn't care about any non-ASCII characters. (If all of this seems like a pointless pain and a bug magnet—well, it is; that's the main reason Python 3 exists.)
As for the retail_price, the same basic observations apply. Again, it's a list of strings, not just a string. And again, you probably don't need regex. Assuming the price is always a $ (or other single-character currency marker) followed by a number, just slice off the $ and call float or Decimal on it:
item['retail_price'] = [float(price[1:]) for price in item['retail_price']]
… but if you have examples that look different, with arbitrary extra characters on both sides of the price, you can use re.search here, but you'll still need to map it, and to use a Unicode pattern.
You also need to grab the matching group out of the search, and to handle empty/invalid strings in some way (they'll return None for the search, and you can't convert that to a float). You have to decide what to do about it, but from your attempt with filter it looks like you just want to skip them. This is complicated enough that I'd do it in multiple steps:
prices = item['price']
matches = (re.search(r'[0-9.]+', price) for price in prices)
groups = (match.group() for match in matches if match)
item['price'] = map(float, validmatches)
… or maybe wrap that in a function.
You can define a method like below which takes an object and returns all the leaves normalized.
import six
def normalize(obj):
if isinstance(obj, six.string_types):
return ' '.join(obj.split())
elif isinstance(obj, list):
return [normalize(x) for x in obj]
elif isinstance(obj, dict):
return {k:normalize(v) for k,v in obj.items()}
return obj
This is a recursive method and does not modify the original object instead returns the normalized object. You can also use it for normalizing the strings.
For your example item
>> item = {'store_id': 870, 'sale_price_low': [], 'brand': [u'\n KORAL ACTIVEWEAR\n '], 'currency': 'AUD', 'retail_price': [u'$140.00'], 'category': [u'Activewear'], 'title': [u'\n Boom Leggings\n '], 'url': [u'/boom-leggings-koral-activewear/vp/v=1/1524019474.htm?folderID=13331&fm=other-shopbysize-viewall&os=false&colorId=68136'], 'sale_price_high': [], 'image_url': [u' https://images-na.sample-store.com/images/G/01/samplestore/p/prod/products/kacti/kacti3025868136/kacti3025868136_q1_2-0._SH20_QL90_UY365_.jpg\n'], 'category_link': 'https://www.samplestore.com/clothing-activewear/br/v=1/13331.htm?baseIndex=500', 'store': 'SampleStore'}
>> print (normalize(item))
>> {'category': [u'Activewear'], 'store_id': 870, 'sale_price_low': [], 'title': [u'Boom Leggings'], 'url': [u'/boom-leggings-koral-activewear/vp/v=1/1524019474.htm?folderID=13331&fm=other-shopbysize-viewall&os=false&colorId=68136'], 'brand': [u'KORAL ACTIVEWEAR'], 'currency': 'AUD', 'image_url': [u'https://images-na.sample-store.com/images/G/01/samplestore/p/prod/products/kacti/kacti3025868136/kacti3025868136_q1_2-0._SH20_QL90_UY365_.jpg'], 'category_link': 'https://www.samplestore.com/clothing-activewear/br/v=1/13331.htm?baseIndex=500', 'sale_price_high': [], 'retail_price': [u'$140.00'], 'store': 'SampleStore'}
ok, I've been at this one for hours I admit defeat and beg for your mercy.
goal: I have multiple files (bank statement downloads), and I want to
Merge, sort, remove duplicates.
the downloads are in this format:
"08/04/2015","Balance","5,804.30","Current Balance for account 123S14"
"08/04/2015","Balance","5,804.30","Available Balance for account 123S14"
"02/03/2015","241.25","Transaction description","2,620.09"
"02/03/2015","-155.49","Transaction description","2,464.60"
"03/03/2015","82.00","Transaction description","2,546.60"
"03/03/2015","243.25","Transaction description","2,789.85"
"03/03/2015","-334.81","Transaction description","2,339.12"
"04/03/2015","-25.05","Transaction description","2,314.07"
one of my prime issues, aside from total ignorance of what I'm doing, is that the numerical values contain commas. I've written, successfully, code that strips such 'buried' commas out, I then strip the quotes so that I have a CSV...line.
so I now have my data in this format
['02/03/2015', ' \t ', '241.25\t ', ' \t ', 'Transaction Details\n', '02/03/2015', ' \t ', ' \t ', '-155.49\t ', 'Transaction Details\n', '03/03/2015', ' \t ', '82.00\t ', ' \t ', 'Transaction Details\n', '03/03/2015', ' \t ', '243.25\t ', ' \t ', 'Transaction Details\n', '02/03/2015', ' \t ', '241.25\t ', ' \t ', 'Transaction Details\n']
which I believe makes it nearly ready to do a sort on first the element, but I think it's now one long list, instead of a list of lists.
I researched sorts and found the lambda...function, so I started to implement
new_file_data = sorted(new_file_data, key=lambda item: item[0])
but element [0] was just the " at the BOL.
I also noted that I needed to instruct that the date was not in, possibly, the correct format, which led me to this construct:
sorted(new_file_data, key=lambda d: datetime.strptime(d, '%d/%m/%Y'))
I get, loosely, the 'map' construct but not how to combine such that I can just reference element[0] as well as how to reference it (datewise)
and now I'm here, hopefully someone could push me over this hurdle?
I think I need to [have] split the list better to start with so each line is an element - I did at one point get a sorted result but all the fields got globbed together, values (sorted) then dates then words etc
So if anyone could offer some advice on my failed list manipulation and how to structure that sort-lambda.
thanks to those who have the time and know how to respond to such starter queries.
If I understand correctly you want to read the contents of the csv and sort them by date.
Given the contents of data.csv
"08/04/2015","Balance","5,804.30","Current Balance for account 123S14"
"08/04/2015","Balance","5,804.30","Available Balance for account 123S14"
"02/03/2015","241.25","Transaction description","2,620.09"
"02/03/2015","-155.49","Transaction description","2,464.60"
"03/03/2015","82.00","Transaction description","2,546.60"
"03/03/2015","243.25","Transaction description","2,789.85"
"03/03/2015","-334.81","Transaction description","2,339.12"
"04/03/2015","-25.05","Transaction description","2,314.07"
I would use the csv-module to read the data.
import csv
with open('data.csv') as f:
data = [row for row in csv.reader(f)]
Which gives:
>>> data
[['08/04/2015', 'Balance', '5,804.30', 'Current Balance for account 123S14'],
['08/04/2015', 'Balance', '5,804.30', 'Available Balance for account 123S14'],
['02/03/2015', '241.25', 'Transaction description', '2,620.09'],
['02/03/2015', '-155.49', 'Transaction description', '2,464.60'],
['03/03/2015', '82.00', 'Transaction description', '2,546.60'],
['03/03/2015', '243.25', 'Transaction description', '2,789.85'],
['03/03/2015', '-334.81', 'Transaction description', '2,339.12'],
['04/03/2015', '-25.05', 'Transaction description', '2,314.07']]
Then you can use the datetime-module to provide a key for sorting.
import datetime
sorted_data = sorted(data, key=lambda row: datetime.datetime.strptime(row[0], "%d/%m/%Y"))
Which gives:
>>> sorted_data
[['02/03/2015', '241.25', 'Transaction description', '2,620.09'],
['02/03/2015', '-155.49', 'Transaction description', '2,464.60'],
['03/03/2015', '82.00', 'Transaction description', '2,546.60'],
['03/03/2015', '243.25', 'Transaction description', '2,789.85'],
['03/03/2015', '-334.81', 'Transaction description', '2,339.12'],
['04/03/2015', '-25.05', 'Transaction description', '2,314.07'],
['08/04/2015', 'Balance', '5,804.30', 'Current Balance for account 123S14'],
['08/04/2015', 'Balance', '5,804.30', 'Available Balance for account 123S14']]
You can define your own sorting function.
Do a mix of these two questions and you'll have what you want (or something close):
Custom Python list sorting
Python date string to date object
In your sorting function, tranform the date from string to datetime and compare
def cmp_items(a, b):
datetime_a = datetime.datetime.strptime(a.[0], "%d/%m/%Y").date()
datetime_b = datetime.datetime.strptime(a.[0], "%d/%m/%Y").date()
if datetime_a > datetime_b:
return 1
elif datetime_a == datetime_b:
return 0
else:
return -1
and then, you just have to sort the list using it
new_file_data = new_file_data.sort(cmp_items)
You can still have a little problem after that, the elements with the same date will be in an random-like order. You can improve the comparing function to compare more stuff to prevent that.
BTW, you have not stripped the burried commas out, it seems you have completely removed the last part.