Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a csv file with the following data.
Column-1 Column-2 Column-3
bob sweet 4
alice uber 4.5
bob uber 4
alice sweet 4.5
razi fav 2.5
razi uber 3.5
bob fav 4
I want to convert it to a dictionary as shown,
A={'bob':{'sweet':'4', 'uber':'4', 'fav':'4'},
'alice':{'uber':'4.5', 'sweet':'4.5'},
'razi':{'fav':'2.5', 'uber':'3.5'}}
in python
For that i am willing to do like this..convert the csv to list like this and then get my output. I am unable to do so, coz keys are repeated as shown.
A={'bob':['sweet','4'],
'alice':['uber','4.5'],
'bob':['uber','4'],
'alice':['sweet','4.5'],
'razi':['fav','2.5'],
'razi':['uber','3.5'],
'bob':['fav','4']}
Can any one suggest a way to solve problem?
Assuming you don't have any space in your datas, and all your actual data rows have exactly 3 fields:
import logging
logging.basicConfig(level=logging.INFO) # <- in a real application,
# should be set application-wide
# from a config file
logger = logging.getLogger("CSV import")
result = {}
nlines = 0
ok = 0
warnings = 0
with open("my_file.csv") as f:
f.readline() # Skip header. Assuming only one line of heading
for row in (line.split() for line in f):
nlines += 1
try:
k1,k2, val = row
result.setdefault(k1,{})[k2] = val
ok += 1
except ValueError:
logger.warning("Format mismatch: %s", row)
warnings += 1
# what to do next?
logger.info("%d lines read. %d imported. %d warnings",nlines,ok,warnings)
from pprint import pprint
pprint(result)
Given your sample data file, this produces:
INFO:CSV import:7 lines read. 7 imported. 0 warnings
{'alice': {'sweet': '4.5', 'uber': '4.5'},
'bob': {'fav': '4', 'sweet': '4', 'uber': '4'},
'razi': {'fav': '2.5', 'uber': '3.5'}}
The trick here is to use setdefault to access to outer dictionary. It will either return the value if the key was already present -- or a new dictionary if this is the first time we encounter that key. After that, this is simply a matter of adding the value to the inner dictionary as usual.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 11 months ago.
Improve this question
I have this dictionary that takes data from a csv file:
def read_results(full_file_path):
csv_dict = {}
with open(full_file_path,'r') as t:
table = t.readlines()[1:]
for line in table:
line = line.replace('\n', '')
line = line.split(',')
line = list(map(float, line))
key = (line[1], line[3])
if key in csv_dict:
csv_dict[key].append((line[4], line[5], line[6]))
else:
csv_dict[key] = [(line[4], line[5], line[6])]
return csv_dict
#that looks like this:
{(1.0, 3.0): [(602.0, 1661.0, 0.0), (945.0, 2164.0, 0.0), (141.0, 954.0, 0.0), (138.0, 913.0, 0.0),....}
but now i need to make use of this dictionary to create a csv of my own that needs to calculate the mean of each value row to its corresponding key couple like this:
c b first finish fix/ext
1 3 744.67 1513.67 0.67
0.8 3 88 858.67 0.67
0.8 1.5 301.5 984.5 0.5
1 1.5 419 844.5 0
and i cant use any outside libraries or modules, what i tried until now :
def Summarize_res(results):
with open('summary_results.csv', 'w', newline='') as f:
header = ['c','b','first','finish','fix/ext']
f.write(str(header))
for line in dict:
first = sum(line[4])/len(line[4])
finish = sum(line[5])/len(line[6])
fix_ext = sum(line[5])/len(line[6])
I'm not sure if that is what you want exactly, but here for every key it will find a mean of the corresponding values of the tuples and write it in the file. The code can definitely be simplified, but I didn't have much time for it, sorry.
def Summarize_res(dict):
with open('summary_results.csv', 'w') as f:
header = ['c','b','first','finish','fix/ext']
f.write(','.join(str(e) for e in header))
f.write("\n")
for key in dict:
key1, key2 = key
first_arr = []
finish_arr = []
fix_arr = []
for element in dict[key]:
first, finish, fix = element
first_arr.append(first)
finish_arr.append(finish)
fix_arr.append(fix)
first_final = sum(first_arr) / len(first_arr)
finish_final = sum(finish_arr) / len(finish_arr)
fix_final = sum(fix_arr) / len(fix_arr)
result = [key1, key2, first_final, finish_final, fix_final]
f.write(','.join(str(e) for e in result))
f.write("\n")
Summarize_res(dict)
This part can be just inserted to your previous code and it should work.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
i would like to search and replace Ids in two different txt files.
So i have those .txt files.
OLDIDS.txt which seems like:
12F
130
132
106
100
...
and the other file MASTERIDS.txt where the old and new Ids where splittet in columns like: (LEFT old ids, RIGHT new ids)
100 132
12F 1FF
106 256
... ...
What I want to do is to open OLDIDS.txt like
f2 = open('OLDIDS.txt', 'w')
and search for the first id in the first line (which is 12F) find this in the second line of MASTERIDS.txt and write the new ID 1FF to the second line of newFile.txt.
I am converting the data in your MASTERIDS.txt to a dictionary. Key is the old id and value is the new ID. Then search for the dict for the new id using the value from OLDIDS.txt
Demo:
with open("PATH_TO_OLDIDS.txt", "r") as src:
data = src.readlines()
d ={}
with open("PATH_TO_MASTERIDS.txt", "r") as toReplaceSRC:
for i in toReplaceSRC.readlines():
val = i.split()
d[val[0].strip()] = val[1].strip()
for i in data:
toReplace = d.get(i.strip(), None)
if toReplace:
print(i.strip(), " = ", toReplace)
Output:
12F = 1FF
106 = 256
100 = 132
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I'm attempting to replace URL's and #username mentions of twitter data using Python's regular expression and a for loop.
d = df['text']
for i, e in enumerate(d):
d[i] = re.sub('((www.\.[\s]+)|(https?://[^\s]+))','URL', e)
d[i] = re.sub('#[^\s]+', 'AT_USER', e)
The problem is that the for loop only works for the second line of regex code ('AT_USER'). I want to replace the URL AND #username mentions. I was thinking of making two separate for loops for each but surely there's a more effective way?
So, the issue with your code as of now is here -
# vvv
d[i] = re.sub('#[^\s]+', 'AT_USER', e)
You should be passing d[i] instead of e. The fact that you pass e means you overwrite the result of the first replacement. Change it, and it should work.
You're using pandas. It's time to ditch the loop. First, initialise a dictionary of regex-replacement pairs -
p_dict = {r'((www.\.[\s]+)|(https?://[^\s]+))' : 'URL', r'#[^\s]+' : 'AT_USER'}
Now, pass this to df.replace with the regex switch -
df['text'] = df['text'].replace(p_dict, regex=True)
Here's a little example with some dummy data -
s
0 12.2
1 12.5
2 12.6
3 15.1
4 15.3
5 15.0
dtype: object
s[0]
Out[190]: '12.2' # a string
p_dict = {'\d' : '<DIGIT>', '\.' : '<DOT>'}
s.replace(p_dict, regex=True)
0 <DIGIT><DIGIT><DOT><DIGIT>
1 <DIGIT><DIGIT><DOT><DIGIT>
2 <DIGIT><DIGIT><DOT><DIGIT>
3 <DIGIT><DIGIT><DOT><DIGIT>
4 <DIGIT><DIGIT><DOT><DIGIT>
5 <DIGIT><DIGIT><DOT><DIGIT>
dtype: object
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
This is a snippet of the output:
{...,"resultMap":
{..."SEARCH_RESULTS":
[{..."resultList":[
{"userClientId":"1"","preferenceValues":["48","51","94"],"MyDate":"7/26/2017 8:30:00 AM"},
{"userClientId":"2","preferenceValues":["42","11","84"],"MyDate":"7/26/2017 9:40:00 AM"},
{"userClientId":"3","preferenceValues":["4","16","24"],"MyDate":"7/26/2017 4:20:00 PM"},
{"userClientId":"4","preferenceValues":["7","2","94"],"MyDate":"7/27/2017 8:00:00 AM"},
{"userClientId":"1","preferenceValues":["48","22","94"],"MyDate":"7/27/2017 1:50:00 PM"},
{"userClientId":"2","preferenceValues":["42","11"],"MyDate":"7/27/2017 2:00:00 PM"},
{"userClientId":"3","preferenceValues":["4","24"],"MyDate":"7/27/2017 6:15:00 PM"},
{"userClientId":"4","preferenceValues":"7","MyDate":"7/27/2017 9:30:00 PM"}]
}]
}
}
I am looking to get a variable pageIdCount that is in dictionary format, where the key is page_id and the values are a counts of occurrences of page_id, by user_id. So for userId 1 it should look like:
{"userClientId":"1","preferenceValues":{48:2, 51:1, 94:2, 22:1}}
Note that when there is only 1 variable inside preferenceValues- there are no brackets. There is also a field "preferenceValue" where there are no brackets no matter what and it is identical to "preferenceValues" otherwise.
Is that possible?
In Python 2.7, I specify user, password and url and then I have the following:
req = requests.post(url = url, auth=(user, password))
ans = req.json()
print ["resultMap"]["SEARCH_RESULTS"][0]["resultList"]
Any help is greatly appreciated.
your_data # this is your data
final_data = {}
for line in yourdata:
uid = line["userId"]
pids = line["PageId"]
if uid not in final_data :
final_data[uid] = {}
for pid in pids :
pid = int(pid)
if pid not in final_data[uid]:
final_data[uid][pid]=0
final_data[uid][pid] += 1
res = [{"userId":uid,"PageIDCount":pids} for uid,pids in final_data.items()]
I suppose you are beginning, if so, the most tricky part of this code will probably be the last line, it uses list comprehension. here is a good lesson about it.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a list of pairs of words, and am trying to prepare them as data for NetworkX to read. Part of the script is iterating over the pairs to map them to id numbers (see code below). This code throws an Index out of range-error that I need to get past. What is the mistake here?
coocs = [['parttim;work'], ['parttim;work'],['parttim;visit'], ['parttim;site'], ['parttim;uncl'], ['parttim;home'], ['parttim;onlin']]
unique_coocs = list(set([row[0] for row in coocs])) # remove redundance
ids = list(enumerate(unique_coocs)) # creates a list of tuples with unique ids and their names for each word in the network
keys = {name: i for i, name in enumerate(unique_coocs)} # creates a dictionary(hash map) that maps each id to the words
links = [] # creates a blank list
for row in coocs: # maps all of the names in the list to their id number
try:
links.append({keys[row[0]]: keys[row[1]]})
except:
links.append({row[0]: row[1]})
Mistake occurs at row, because len(row) in always 1 here, So you cannot use the index number 1 which is row[1]
Corrected code is,
for row in coocs:
links.append(row[0]+':'+str(keys[row[0]]))
print links
Output:
['parttim;work:2', 'parttim;work:2', 'parttim;visit:3', 'parttim;site:4', 'parttim;uncl:0', 'parttim;home:5', 'parttim;onlin:1']
This works fine
for row in coocs: # maps all of the names in the list to their id number
links.append({row[0]: keys[row[0]]})
>>> links
[{'parttim;work': 2}, {'parttim;work': 2}, {'parttim;visit': 3}, {'parttim;site': 4}, {'parttim;uncl': 0}, {'parttim;home': 5}, {'parttim;onlin': 1}]