Convert a matrix from string to integer - python

I'm trying to change a matrix of numbers from string to integer but it just doesn't work.
for element in list:
for i in element:
i = int(i)
What am I doing wrong?
Edit:
This is the whole code:
import numpy as np
t_list = []
t_list = np.array(t_list)
list_rains_per_months = [['63', '65', '50', '77', '66', '69'],
['65', '65', '67', '50', '54', '58'],
['77', '73', '80', '83', '89', '100'],
['90', '85', '90', '90', '84', '90'],
['129', '113', '120', '135', '117', '130'],
['99', '116', '114', '111', '119', '100'],
['105', '98', '112', '113', '102', '100'],
['131', '120', '111', '141', '130', '126'],
['85', '101', '88', '89', '94', '91'],
['122', '103', '119', '98', '101', '107'],
['121', '101', '104', '121', '115', '104'],
['67', '44', '58', '61', '64', '58']]
for element in t_list:
for i in element:
i = int(i)
I apologize for any mistakes, I'm new to python

What you're doing wrong, is that you're not changing the list or any list element: the 'i' inside the loop starts by pointing to each element of the list, then you make it point to something else, but that doesn't affect your list (also, avoid using 'list' as an identifier, it's an existing type, that's asking for trouble).
One way to do it is with list comprehensions. Assuming your matrix is a list of (inner) lists, for example:
a_list = [["3", "56", "78"], ["2", "39", "60"], ["87", "9", "71"]]
then two nested list comprehensions should do the trick:
a_list = [[int(i) for i in inner_list] for inner_list in a_list]
This builds a new list, formed by going over your initial list, applying the change you want, and saving it a another (or the same) list.

In numpy you do it that way.
import numpy as np
list_rains_per_months = [['63', '65', '50', '77', '66', '69'],
['65', '65', '67', '50', '54', '58'],
['77', '73', '80', '83', '89', '100'],
['90', '85', '90', '90', '84', '90'],
['129', '113', '120', '135', '117', '130'],
['99', '116', '114', '111', '119', '100'],
['105', '98', '112', '113', '102', '100'],
['131', '120', '111', '141', '130', '126'],
['85', '101', '88', '89', '94', '91'],
['122', '103', '119', '98', '101', '107'],
['121', '101', '104', '121', '115', '104'],
['67', '44', '58', '61', '64', '58']]
list_rains_per_months = np.array(list_rains_per_months)
myfunc = np.vectorize(lambda x: int(x))
list_rains_per_months = myfunc(list_rains_per_months)
print(list_rains_per_months)
Output
[[ 63 65 50 77 66 69]
[ 65 65 67 50 54 58]
[ 77 73 80 83 89 100]
[ 90 85 90 90 84 90]
[129 113 120 135 117 130]
[ 99 116 114 111 119 100]
[105 98 112 113 102 100]
[131 120 111 141 130 126]
[ 85 101 88 89 94 91]
[122 103 119 98 101 107]
[121 101 104 121 115 104]
[ 67 44 58 61 64 58]]

You could use enumerate object in loops:
list = [["12", "10", "0"],
["0", "33", "60"]]
for h, i in enumerate(list):
for j, k in enumerate(i):
list[h][j] = int(k)
print(list)

Could also just map each row's values to int:
for row in list_rains_per_months:
row[:] = map(int, row)
Note that I assign to row[:], i.e., into the row and thus into the matrix. If I assigned to row instead, I'd have the same problem as you with your i: I'd only assign to the variable, not into the row/matrix.

Related

How to transform index values into columns using Pandas?

I have a dictionary like this:
my_dict = {'RuleSet': {'0': {'RuleSetID': '0',
'RuleSetName': 'Allgemein',
'Rules': [{'RulesID': '10',
'RuleName': 'Gemeinde Seiten',
'GroupHits': '2',
'KeyWordGroups': ['100', '101', '102']}]},
'1': {'RuleSetID': '1',
'RuleSetName': 'Portale Berlin',
'Rules': [{'RulesID': '11',
'RuleName': 'Portale Berlin',
'GroupHits': '4',
'KeyWordGroups': ['100', '101', '102', '107']}]},
'6': {'RuleSetID': '6',
'RuleSetName': 'Zwangsvollstr. Berlin',
'Rules': [{'RulesID': '23',
'RuleName': 'Zwangsvollstr. Berlin',
'GroupHits': '1',
'KeyWordGroups': ['100', '101']}]}}}
When using this code snippet it can be transformed into a dataframe:
rules_pd = pd.DataFrame(my_dict['RuleSet'])
rules_pd
The result is:
I would like to make it look like this:
Does anyone know how to tackle this challenge?
Doing from_dict with index
out = pd.DataFrame.from_dict(my_dict['RuleSet'],'index')
Out[692]:
RuleSetID ... Rules
0 0 ... [{'RulesID': '10', 'RuleName': 'Gemeinde Seite...
1 1 ... [{'RulesID': '11', 'RuleName': 'Portale Berlin...
6 6 ... [{'RulesID': '23', 'RuleName': 'Zwangsvollstr....
[3 rows x 3 columns]
#out.columns
#Out[693]: Index(['RuleSetID', 'RuleSetName', 'Rules'], dtype='object')
You could try use Transpose()
rules_pd = pd.DataFrame(my_dict['RuleSet']).transpose()
print(rules_pd)

Python - BeautifulSoup - Iterating through findall by specific elements in list

I'm pretty new to the world of web scraping so am looking for some guidance to an issue I've been trying to resolve for a few hours.
I'm trying to loop through a table looking structure (it's not an actual table though) and have used findall to bring back all the details of a certain tag.
The challenge I have is that every element of the "table" has the same class name "final-leaderboard__content" so I'm left with a huge list so I want to iterate through and retrieve the details for so I can create a csv/excel with the details. This is the code below
from bs4 import BeautifulSoup
import requests
TournamentURL = "https://www.theopen.com/previous-opens/19th-open-st-andrews-1879/"
TournamentResponse = requests.get(TournamentURL)
TournamentData = TournamentResponse.text
TournamentSoup = BeautifulSoup(TournamentData, 'html.parser')
RowContents = TournamentSoup.findAll("div", {"class": "final-leaderboard__content"})
for RowContent in RowContents:
The result is something like this and I can't work out the best way without there being any explicit tag/id to know that item 0,8,16 etc is the Player Name, item 1,9,17 is the Finish etc etc
[0] - Name
[1] - Finish
[2] - R1
[3] - R2
[4] - R3
[5] - R4
[6] - Total
[7] - Par
[8] - Name (The second Name)
[9] - Finish (The second Finish)
etc
etc
I've tried splice, modulo and various other variants of the same but can't seem to work it out.
You can use the fact that this is indeed a kind of tabular data and grab all divs that represent a row, split it by a number of columns, and there's your data:
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
url = "https://www.theopen.com/previous-opens/19th-open-st-andrews-1879/#leaderboard"
page = requests.get(url).content
leaderboard = BeautifulSoup(page, "html.parser").find_all("div", {"class": "final-leaderboard__content"})
column_count = 8
split_by_columns = [
leaderboard[i:i+column_count] for i in range(0, len(leaderboard), column_count)
]
table = [[i.getText(strip=True) for i in row] for row in split_by_columns]
print(tabulate(table[1:], headers=table[0]))
Output:
Name Finish R1 R2 R3 R4 Total Par
----------------------------- -------- ---- ---- ---- ---- ------- -----
Jamie ANDERSONChampion Golfer 1 84 85 - - 169 M/C
Andrew KIRKALDY 2 86 86 - - 172 M/C
Jamie ALLAN 2 88 84 - - 172 M/C
George PAXTON 4 89 85 - - 174 M/C
Tom KIDD 5 87 88 - - 175 M/C
Bob FERGUSON 6 89 87 - - 176 M/C
J.O.F. MORRIS 7 92 87 - - 179 M/C
Jack KIRKALDY 8 92 89 - - 181 M/C
James RENNIE 8 93 88 - - 181 M/C
Willie FERNIE 8 92 89 - - 181 M/C
David AYTON 11 95 89 - - 184 M/C
Henry LAMB 11 91 93 - - 184 M/C
Tom ARUNDEL 11 95 89 - - 184 M/C
Tom MORRIS SR 14 92 93 - - 185 M/C
William DOLEMAN 14 91 94 - - 185 M/C
Robert KINSMAN 14 88 97 - - 185 M/C
Bob MARTIN 17 93 93 - - 186 M/C
Ben SAYERS 18 92 95 - - 187 M/C
David ANDERSON SR 19 94 94 - - 188 M/C
David CORSTORPHINE 20 93 96 - - 189 M/C
Tom DUNN 20 90 99 - - 189 M/C
Peter PAXTON 20 99 90 - - 189 M/C
[A] SMITH 20 94 95 - - 189 M/C
D. GRANT 20 95 94 - - 189 M/C
Bob DOW 20 95 94 - - 189 M/C
Walter GOURLAY 20 92 97 - - 189 M/C
A.W. SMITH 27 91 99 - - 190 M/C
Douglas Argyll ROBERTSON 27 97 93 - - 190 M/C
Robert ARMIT 29 95 96 - - 191 M/C
George STRATH 29 97 94 - - 191 M/C
J.H. BLACKWELL 31 96 96 - - 192 M/C
Tom MANZIE 32 96 97 - - 193 M/C
George LOWE 33 94 100 - - 194 M/C
G. HONEYMAN 33 97 97 - - 194 M/C
James FENTON 35 99 97 - - 196 M/C
Robert TAIT 35 99 97 - - 196 M/C
Bob KIRK 37 99 98 - - 197 M/C
Rev. D. LUNDIE 37 98 99 - - 197 M/C
Fitz BOOTHBY 39 96 102 - - 198 M/C
J. Thomson WHITE 40 102 99 - - 201 M/C
James KIRK 41 105 97 - - 202 M/C
W.H. GOFF 42 105 99 - - 204 M/C
import requests
from bs4 import BeautifulSoup
def parse_row(row):
for div in row.find_all("div", {"class": "final-leaderboard__content"}):
yield div.text.strip().replace('\n', ' ')
url = "https://www.theopen.com/previous-opens/19th-open-st-andrews-1879/#leaderboard"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find("div", {"class": "final-leaderboard__table"})
rows = table.find_all('div', {'class':"final-leaderboard__row"})
header = list(parse_row(rows[0]))
for row in rows[1:]:
print(dict(zip(header, list(parse_row(row)))))
output
{'Name': 'Jamie ANDERSON Champion Golfer', 'Finish': '1', 'R1': '84', 'R2': '85', 'R3': '-', 'R4': '-', 'Total': '169', 'Par': 'M/C'}
{'Name': 'Andrew KIRKALDY', 'Finish': '2', 'R1': '86', 'R2': '86', 'R3': '-', 'R4': '-', 'Total': '172', 'Par': 'M/C'}
{'Name': 'Jamie ALLAN', 'Finish': '2', 'R1': '88', 'R2': '84', 'R3': '-', 'R4': '-', 'Total': '172', 'Par': 'M/C'}
{'Name': 'George PAXTON', 'Finish': '4', 'R1': '89', 'R2': '85', 'R3': '-', 'R4': '-', 'Total': '174', 'Par': 'M/C'}
{'Name': 'Tom KIDD', 'Finish': '5', 'R1': '87', 'R2': '88', 'R3': '-', 'R4': '-', 'Total': '175', 'Par': 'M/C'}
{'Name': 'Bob FERGUSON', 'Finish': '6', 'R1': '89', 'R2': '87', 'R3': '-', 'R4': '-', 'Total': '176', 'Par': 'M/C'}
{'Name': 'J.O.F. MORRIS', 'Finish': '7', 'R1': '92', 'R2': '87', 'R3': '-', 'R4': '-', 'Total': '179', 'Par': 'M/C'}
{'Name': 'Jack KIRKALDY', 'Finish': '8', 'R1': '92', 'R2': '89', 'R3': '-', 'R4': '-', 'Total': '181', 'Par': 'M/C'}
{'Name': 'James RENNIE', 'Finish': '8', 'R1': '93', 'R2': '88', 'R3': '-', 'R4': '-', 'Total': '181', 'Par': 'M/C'}
{'Name': 'Willie FERNIE', 'Finish': '8', 'R1': '92', 'R2': '89', 'R3': '-', 'R4': '-', 'Total': '181', 'Par': 'M/C'}
{'Name': 'David AYTON', 'Finish': '11', 'R1': '95', 'R2': '89', 'R3': '-', 'R4': '-', 'Total': '184', 'Par': 'M/C'}
{'Name': 'Henry LAMB', 'Finish': '11', 'R1': '91', 'R2': '93', 'R3': '-', 'R4': '-', 'Total': '184', 'Par': 'M/C'}
{'Name': 'Tom ARUNDEL', 'Finish': '11', 'R1': '95', 'R2': '89', 'R3': '-', 'R4': '-', 'Total': '184', 'Par': 'M/C'}
{'Name': 'Tom MORRIS SR', 'Finish': '14', 'R1': '92', 'R2': '93', 'R3': '-', 'R4': '-', 'Total': '185', 'Par': 'M/C'}
{'Name': 'William DOLEMAN', 'Finish': '14', 'R1': '91', 'R2': '94', 'R3': '-', 'R4': '-', 'Total': '185', 'Par': 'M/C'}
{'Name': 'Robert KINSMAN', 'Finish': '14', 'R1': '88', 'R2': '97', 'R3': '-', 'R4': '-', 'Total': '185', 'Par': 'M/C'}
{'Name': 'Bob MARTIN', 'Finish': '17', 'R1': '93', 'R2': '93', 'R3': '-', 'R4': '-', 'Total': '186', 'Par': 'M/C'}
{'Name': 'Ben SAYERS', 'Finish': '18', 'R1': '92', 'R2': '95', 'R3': '-', 'R4': '-', 'Total': '187', 'Par': 'M/C'}
{'Name': 'David ANDERSON SR', 'Finish': '19', 'R1': '94', 'R2': '94', 'R3': '-', 'R4': '-', 'Total': '188', 'Par': 'M/C'}
{'Name': 'David CORSTORPHINE', 'Finish': '20', 'R1': '93', 'R2': '96', 'R3': '-', 'R4': '-', 'Total': '189', 'Par': 'M/C'}
{'Name': 'Tom DUNN', 'Finish': '20', 'R1': '90', 'R2': '99', 'R3': '-', 'R4': '-', 'Total': '189', 'Par': 'M/C'}
{'Name': 'Peter PAXTON', 'Finish': '20', 'R1': '99', 'R2': '90', 'R3': '-', 'R4': '-', 'Total': '189', 'Par': 'M/C'}
{'Name': '[A] SMITH', 'Finish': '20', 'R1': '94', 'R2': '95', 'R3': '-', 'R4': '-', 'Total': '189', 'Par': 'M/C'}
{'Name': 'D. GRANT', 'Finish': '20', 'R1': '95', 'R2': '94', 'R3': '-', 'R4': '-', 'Total': '189', 'Par': 'M/C'}
{'Name': 'Bob DOW', 'Finish': '20', 'R1': '95', 'R2': '94', 'R3': '-', 'R4': '-', 'Total': '189', 'Par': 'M/C'}
{'Name': 'Walter GOURLAY', 'Finish': '20', 'R1': '92', 'R2': '97', 'R3': '-', 'R4': '-', 'Total': '189', 'Par': 'M/C'}
{'Name': 'A.W. SMITH', 'Finish': '27', 'R1': '91', 'R2': '99', 'R3': '-', 'R4': '-', 'Total': '190', 'Par': 'M/C'}
{'Name': 'Douglas Argyll ROBERTSON', 'Finish': '27', 'R1': '97', 'R2': '93', 'R3': '-', 'R4': '-', 'Total': '190', 'Par': 'M/C'}
{'Name': 'Robert ARMIT', 'Finish': '29', 'R1': '95', 'R2': '96', 'R3': '-', 'R4': '-', 'Total': '191', 'Par': 'M/C'}
{'Name': 'George STRATH', 'Finish': '29', 'R1': '97', 'R2': '94', 'R3': '-', 'R4': '-', 'Total': '191', 'Par': 'M/C'}
{'Name': 'J.H. BLACKWELL', 'Finish': '31', 'R1': '96', 'R2': '96', 'R3': '-', 'R4': '-', 'Total': '192', 'Par': 'M/C'}
{'Name': 'Tom MANZIE', 'Finish': '32', 'R1': '96', 'R2': '97', 'R3': '-', 'R4': '-', 'Total': '193', 'Par': 'M/C'}
{'Name': 'George LOWE', 'Finish': '33', 'R1': '94', 'R2': '100', 'R3': '-', 'R4': '-', 'Total': '194', 'Par': 'M/C'}
{'Name': 'G. HONEYMAN', 'Finish': '33', 'R1': '97', 'R2': '97', 'R3': '-', 'R4': '-', 'Total': '194', 'Par': 'M/C'}
{'Name': 'James FENTON', 'Finish': '35', 'R1': '99', 'R2': '97', 'R3': '-', 'R4': '-', 'Total': '196', 'Par': 'M/C'}
{'Name': 'Robert TAIT', 'Finish': '35', 'R1': '99', 'R2': '97', 'R3': '-', 'R4': '-', 'Total': '196', 'Par': 'M/C'}
{'Name': 'Bob KIRK', 'Finish': '37', 'R1': '99', 'R2': '98', 'R3': '-', 'R4': '-', 'Total': '197', 'Par': 'M/C'}
{'Name': 'Rev. D. LUNDIE', 'Finish': '37', 'R1': '98', 'R2': '99', 'R3': '-', 'R4': '-', 'Total': '197', 'Par': 'M/C'}
{'Name': 'Fitz BOOTHBY', 'Finish': '39', 'R1': '96', 'R2': '102', 'R3': '-', 'R4': '-', 'Total': '198', 'Par': 'M/C'}
{'Name': 'J. Thomson WHITE', 'Finish': '40', 'R1': '102', 'R2': '99', 'R3': '-', 'R4': '-', 'Total': '201', 'Par': 'M/C'}
{'Name': 'James KIRK', 'Finish': '41', 'R1': '105', 'R2': '97', 'R3': '-', 'R4': '-', 'Total': '202', 'Par': 'M/C'}
{'Name': 'W.H. GOFF', 'Finish': '42', 'R1': '105', 'R2': '99', 'R3': '-', 'R4': '-', 'Total': '204', 'Par': 'M/C'}
of course, instead of dict you may use other data structure like namedtuple
Another way is to create a dictionary , enumerate through your Rowcontents and update the dictionary with key as enumerated index(i) mod 8 (i%8) and value "the text"
RowContents = TournamentSoup.findAll("div", {"class": "final-leaderboard__content"})
d={}
for i, RowContent in enumerate(RowContents):
key = (i)%8
d.setdefault(key, []).append(' '.join(RowContent.text.strip().split()))
>>> d
{
0: ['Name','Jamie ANDERSON Champion Golfer','Andrew KIRKALDY','Jamie ALLAN',....]
1: ['Finish','1','2','2','4',....]
2: ['R1','84','86','88','89','87',....]
.......
7: ['Par','M/C','M/C','M/C','M/C','M/C',.....]
if you can use pandas
df = pd.DataFrame(d).rename(columns=df.iloc[0]).drop(df.index[0])
>>> print(df)
Name Finish R1 R2 R3 R4 Total Par
1 Jamie ANDERSON Champion Golfer 1 84 85 - - 169 M/C
2 Andrew KIRKALDY 2 86 86 - - 172 M/C
3 Jamie ALLAN 2 88 84 - - 172 M/C
4 George PAXTON 4 89 85 - - 174 M/C
5 Tom KIDD 5 87 88 - - 175 M/C
6 Bob FERGUSON 6 89 87 - - 176 M/C
7 J.O.F. MORRIS 7 92 87 - - 179 M/C
to save a dataframe to csv use pandas.to_csv()
df.to_csv('yourfile.csv', index=False)

How to read a JSON retrieved from an API and save it into a CSV file?

I am using a weather API that responses with a JSON file. Here is a sample of the returned readings:
{
'data': {
'request': [{
'type': 'City',
'query': 'Karachi, Pakistan'
}],
'weather': [{
'date': '2019-03-10',
'astronomy': [{
'sunrise': '06:46 AM',
'sunset': '06:38 PM',
'moonrise': '09:04 AM',
'moonset': '09:53 PM',
'moon_phase': 'Waxing Crescent',
'moon_illumination': '24'
}],
'maxtempC': '27',
'maxtempF': '80',
'mintempC': '22',
'mintempF': '72',
'totalSnow_cm': '0.0',
'sunHour': '11.6',
'uvIndex': '7',
'hourly': [{
'time': '24',
'tempC': '27',
'tempF': '80',
'windspeedMiles': '10',
'windspeedKmph': '16',
'winddirDegree': '234',
'winddir16Point': 'SW',
'weatherCode': '116',
'weatherIconUrl': [{
'value': 'http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0002_sunny_intervals.png'
}],
'weatherDesc': [{
'value': 'Partly cloudy'
}],
'precipMM': '0.0',
'humidity': '57',
'visibility': '10',
'pressure': '1012',
'cloudcover': '13',
'HeatIndexC': '25',
'HeatIndexF': '78',
'DewPointC': '15',
'DewPointF': '59',
'WindChillC': '24',
'WindChillF': '75',
'WindGustMiles': '12',
'WindGustKmph': '19',
'FeelsLikeC': '25',
'FeelsLikeF': '78',
'uvIndex': '0'
}]
}]
}
}
I used the following Python code in my attempt to reading the data stored in JSON file:
import simplejson as json
data_file = open("new.json", "r")
values = json.load(data_file)
But this outputs with an error as follows:
JSONDecodeError: Expecting value: line 1 column 1 (char 0) error
I am also wondering how I can save the result in a structured format in a CSV file using Python.
As stated below by Rami, the simplest way to do this would to use pandas to either a) .read_json(), or to use pd.DataFrame.from_dict(). however the issue with this particular case is you have nested dictionary/json. What do I mean it's nested? Well, if you were to simply put this into a dataframe, you'd have this:
print (df)
request weather
0 {'type': 'City', 'query': 'Karachi, Pakistan'} {'date': '2019-03-10', 'astronomy': [{'sunrise...
Which is fine if that's what you want. However, I am assuming you'd like all the data/instance flattened into a singe row.
So you'll need to either use json_normalize to unravel it (which is possible, but you'd need to be certain the json file follows the same format/keys throughout. And you'd still need to pull out each of the dictionaries within the list, within the dictionaries. Other option is use some function to flatten out the nested json. Then from there you can simply write to file:
I choose to flatten it using a function, then construct the dataframe:
import pandas as pd
import json
import re
from pandas.io.json import json_normalize
data = {'data': {'request': [{'type': 'City', 'query': 'Karachi, Pakistan'}], 'weather': [{'date': '2019-03-10', 'astronomy': [{'sunrise': '06:46 AM', 'sunset': '06:38 PM', 'moonrise': '09:04 AM', 'moonset': '09:53 PM', 'moon_phase': 'Waxing Crescent', 'moon_illumination': '24'}], 'maxtempC': '27', 'maxtempF': '80', 'mintempC': '22', 'mintempF': '72', 'totalSnow_cm': '0.0', 'sunHour': '11.6', 'uvIndex': '7', 'hourly': [{'time': '24', 'tempC': '27', 'tempF': '80', 'windspeedMiles': '10', 'windspeedKmph': '16', 'winddirDegree': '234', 'winddir16Point': 'SW', 'weatherCode': '116', 'weatherIconUrl': [{'value': 'http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0002_sunny_intervals.png'}], 'weatherDesc': [{'value': 'Partly cloudy'}], 'precipMM': '0.0', 'humidity': '57', 'visibility': '10', 'pressure': '1012', 'cloudcover': '13', 'HeatIndexC': '25', 'HeatIndexF': '78', 'DewPointC': '15', 'DewPointF': '59', 'WindChillC': '24', 'WindChillF': '75', 'WindGustMiles': '12', 'WindGustKmph': '19', 'FeelsLikeC': '25', 'FeelsLikeF': '78', 'uvIndex': '0'}]}]}}
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flat = flatten_json(data['data'])
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = column.replace('_', '')
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
results.to_csv('path/filename.csv', index=False)
Output:
print (results.to_string())
type query date astronomy0sunrise astronomy0sunset astronomy0moonrise astronomy0moonset astronomy0moonphase astronomy0moonillumination maxtempC maxtempF mintempC mintempF totalSnowcm sunHour uvIndex hourly0time hourly0tempC hourly0tempF hourly0windspeedMiles hourly0windspeedKmph hourly0winddirDegree hourly0winddir16Point hourly0weatherCode hourly0weatherIconUrl0value hourly0weatherDesc0value hourly0precipMM hourly0humidity hourly0visibility hourly0pressure hourly0cloudcover hourly0HeatIndexC hourly0HeatIndexF hourly0DewPointC hourly0DewPointF hourly0WindChillC hourly0WindChillF hourly0WindGustMiles hourly0WindGustKmph hourly0FeelsLikeC hourly0FeelsLikeF hourly0uvIndex
0 City Karachi, Pakistan 2019-03-10 06:46 AM 06:38 PM 09:04 AM 09:53 PM Waxing Crescent 24 27 80 22 72 0.0 11.6 7 24 27 80 10 16 234 SW 116 http://cdn.worldweatheronline.net/images/wsymb... Partly cloudy 0.0 57 10 1012 13 25 78 15 59 24 75 12 19 25 78 0

Print nested dictionary in python and export all on a csv file

I have a dictionary like this:
{'https://github.com/project1': {'Batchfile': '91', 'Gradle': '110', 'INI': '25', 'Java': '1879', 'Markdown': '393', 'QMake': '52', 'Shell': '161', 'Text': '202', 'XML': '943'}}
{'https://github.com/project2': {'Batchfile': '91', 'Gradle': '123', 'INI': '25', 'Java': '1305', 'Markdown': '121', 'QMake': '52', 'Shell': '161', 'XML': '234'}}
{'https://github.com/project3': {'Batchfile': '91', 'Gradle': '360', 'INI': '27', 'Java': '805', 'Markdown': '27', 'QMake': '156', 'Shell': '161', 'XML': '380'}}
It is a structured in this way:
{'url': {'lang1': 'locs', 'lang2': 'locs', ...}}
{'url2': {'lang6': 'locs', 'lang5': 'locs', ...}}
where lang stay for languages and locs stay for line of codes (related to the previous language).
What i want to do is print this dictionary in a pretty way,so i can see the results before the export.
After that i want to export the dictionary into a csv file to make other operation. The problem is the languages are not sorted. That is what i mean:
{'https://github.com/Project4': {'HTML': '29', 'Java': '229', 'Markdown': '101', 'Maven POM': '88', 'XML': '62'}}
{'https://github.com/Project5': {'Batchfile': '85', 'Gradle': '84', 'INI': '22', 'Java': '2422', 'Markdown': '25', 'Prolog': '25', 'Shell': '173', 'XML': '3243', 'YAML': '43'}}
Any idea?
You could use pandas:
import pandas as pd
t = [{'https://github.com/project1': {'Batchfile': '91', 'Gradle': '110', 'INI': '25', 'Java': '1879', 'Markdown': '393', 'QMake': '52', 'Shell': '161', 'Text': '202', 'XML': '943'}},
{'https://github.com/project2': {'Batchfile': '91', 'Gradle': '123', 'INI': '25', 'Java': '1305', 'Markdown': '121', 'QMake': '52', 'Shell': '161', 'XML': '234'}},
{'https://github.com/project3': {'Batchfile': '91', 'Gradle': '360', 'INI': '27', 'Java': '805', 'Markdown': '27', 'QMake': '156', 'Shell': '161', 'XML': '380'}}]
columns = set([lang for x in t for l in x.values() for lang in l])
index = [p for x in t for p in x.keys()]
rows = [l for x in t for l in x.values() ]
df = pd.DataFrame(rows, columns=columns, index=index).fillna('N/A')
df.to_csv('projects.csv')
Which gives:
>>> df
Gradle INI Markdown ... Batchfile Java QMake
https://github.com/project1 110 25 393 ... 91 1879 52
https://github.com/project2 123 25 121 ... 91 1305 52
https://github.com/project3 360 27 27 ... 91 805 156
[3 rows x 9 columns]
And in the csv:

sorting by dictionary value in array python

Okay so I've been working on processing some annotated text output. What I have so far is a dictionary with annotation as key and relations an array of elements:
'Adenotonsillectomy': ['0', '18', '1869', '1716'],
'OSAS': ['57', '61'],
'apnea': ['41', '46'],
'can': ['94', '97', '1796', '1746'],
'deleterious': ['103', '114'],
'effects': ['122', '129', '1806', '1752'],
'for': ['19', '22'],
'gain': ['82', '86', '1776', '1734'],
'have': ['98', '102', ['1776 1786 1796 1806 1816'], '1702'],
'health': ['115', '121'],
'lead': ['67', '71', ['1869 1879 1889'], '1695'],
'leading': ['135', '142', ['1842 1852'], '1709'],
'may': ['63', '66', '1879', '1722'],
'obesity': ['146', '153'],
'obstructive': ['23', '34'],
'sleep': ['35', '40'],
'syndrome': ['47', '55'],
'to': ['143', '145', '1852', '1770'],
'weight': ['75', '81'],
'when': ['130', '134', '1842', '1758'],
'which': ['88', '93', '1786', '1740']}
What I want to do is sort this by the first element in the array and reorder the dict as:
'Adenotonsillectomy': ['0', '18', '1869', '1716']
'for': ['19', '22'],
'obstructive': ['23', '34'],
'sleep': ['35', '40'],
'apnea': ['41', '46'],
etc...
right now I've tried to use operator to sort by value:
sorted(dependency_dict.items(), key=lambda x: x[1][0])
However the output I'm getting is still incorrect:
[('Adenotonsillectomy', ['0', '18', '1869', '1716']),
('deleterious', ['103', '114']),
('health', ['115', '121']),
('effects', ['122', '129', '1806', '1752']),
('when', ['130', '134', '1842', '1758']),
('leading', ['135', '142', ['1842 1852'], '1709']),
('to', ['143', '145', '1852', '1770']),
('obesity', ['146', '153']),
('for', ['19', '22']),
('obstructive', ['23', '34']),
('sleep', ['35', '40']),
('apnea', ['41', '46']),
('syndrome', ['47', '55']),
('OSAS', ['57', '61']),
('may', ['63', '66', '1879', '1722']),
('lead', ['67', '71', ['1869 1879 1889'], '1695']),
('weight', ['75', '81']),
('gain', ['82', '86', '1776', '1734']),
('which', ['88', '93', '1786', '1740']),
('can', ['94', '97', '1796', '1746']),
('have', ['98', '102', ['1776 1786 1796 1806 1816'], '1702'])]
I'm not sure whats going wrong. Any help is appreciated.
The entries are sorted in alphabetical order. If you want to sort them on integer value, convert the value to int first:
sorted(dependency_dict.items(), key=lambda x: int(x[1][0]))

Categories