I am new to this whole python and the data mining.
Let's say I have a list of string called data
data[0] = ['I want to make everything lowercase']
data[1] = ['How Do I Do It']
data[2] = ['With A Large DataSet']
and so on. My len(data) gives 50000.
I have tried
{k.lower(): v for k, v in data.items()}
and it gives me error saying that 'list' object has no attribute 'items'.
and I have also tried using .lower() and it is giving me the same AtrributeError.
How do I recursively call the lower() function in all the data[:50000] to make all the of strings in the data to all lowercase?
EDIT:
For more details: I have a json file with datas such as:
{'review/a': 1.0, 'review/b':2.0, 'review/c':This IS the PART where I want to make all loWerCASE}
Then I call a function to get the specific reviews that I want to make all lower case to
def lowerCase(datum):
feat = [datum['review/c']]
return feat
lowercase = [lowercase(d) for d in data]
Now that I have all the 'review/c' information in my lowercase list.
I want to make all of that strings to lower case
if your list data like this:
data = ['I want to make everything lowercase', '', '']
data = [k.lower() for k in data]
if your list data is a list of string list:
data = [['I want to make everything lowercase'], ['']]
data = [[k.lower()] for l in data for k in l]
the fact is that list don't has attribute 'items'
You need a list comprehension, not a dict comprehension:
lowercase_data = [v.lower() for v in data]
Related
For some reason x.split(':', 1)[-1] doesn't do anything. Could someone explain and maybe help me?
I'm trying to remove the data before : (including ":") but it keeps that data anyway
Code
data = { 'state': 1, 'endTime': 1518852709307, 'fileSize': 000000 }
data = data.strip('{}')
data = data.split(',')
for x in data:
x.split(':', 1)[-1]
print(x)`
Output
"state":1
"endTime":1518852709307
"fileSize":16777216
It's a dictonary, not a list of strings.
I think this is what you're looking for:
data = str({"state":1,"endTime":1518852709307,"fileSize":000000}) #add a str() here
data = data.strip('{}')
data = data.split(',')
for x in data:
x=x.split(':')[-1] # set x to x.split(...)
print(x)
The script below prints out:
1
1518852709307
0
Here is a one-liner version:
print (list(map(lambda x:x[1],data.items())))
Prints out:
[1, 1518852709307, 0]
Which is a list of integers.
Seems like you just want the values in the dictionary
data = {"state":1,"endTime":1518852709307,"fileSize":000000}
for x in data:
print(data[x])
I'm not sure, but I think it's because the computer treats "state" and 1 as separate objects. Therefore, it is merely stripping the string "state" of its colons, of which there are none.
You could make the entire dictionary into a string by putting:
data = str({ Your Dictionary Here })
then, print what you have left in for "for x in data" statement like so:
for x in data:
b = x.split(':', 1)[-1] # creating a new string
print(b)
data in your code is a dictionary. So you can just access your the values of it like data[state] which evaluates to 1.
If you get this data as a string like:
data = "{'state':1, 'endTime':1518852709307, 'fileSize':000000}"
You could use json.loads to convert it into a dictionary and access the data like explained above.
import json
data = '{"state":1, "endTime":1518852709307, "fileSize":0}'
data = json.loads(data)
for _,v in data.items():
print(v)
If you want to parse the string yourself this should work:
data = '{"state":1,"endTime":1518852709307,"fileSize":000000}'
data = data.strip('{}')
data = data.split(',')
for x in data:
x=x.split(':')[-1]
print(x)
I have a text file of countries and some describing coordinates, with the following format:
Country
57.7934235704;24.3128625831 58.3834133979;24.42892785 58.2573745795;24.0611983579 58.6127534044;23.4265600929
And i'm having trouble converting the file into a python dictionary with country as the key, and the values as list of lists of float-tuples, like so:
[[(57.7934235704, 24.3128625831), (58.3834133979, 24.42892785), (58.2573745795, 24.0611983579), (58.6127534044, 23.4265600929)]]
I've managed to end up with the following code, which in my understanding manages to add the country as a key, and floats the coordinates individually, so what's missing is a way to tuple the floats in pairs, and add them to their corresponding country.
def read_country_file(filename):
with open(filename) as file:
dict = {}
for line in file:
line = line.rstrip().split(' ')
for element in line:
if re.match('^[A-Z]', element): #if the line starts with a letter make it a key
country = (element[0:])
dict[country] = country
elif re.match('^[-0-9;. ]', element): #if the line starts with a number make it a value
element = element.split(';')
for i in element:
flo = float(i)
#MISSING: Tuple floats in pairs and add them to the dictionary
return dict
If I lookup a country in this dictionary, it will find the country/key correctly, but it has no values attached. And if I type-test my "flo" value it's a float, so i have a feeling I'm almost there.
Let's use tuple comprehension:
element = tuple(float(i) for i in element.split(';'))
Additionally, my solution for your problem:
import re
text = ['Vietnam',
'57.7934235704;24.3128625831 58.3834133979;24.42892785 58.2573745795;24.0611983579 58.6127534044;23.4265600929']
def get_tuples_of_float(string):
return [tuple(map(float, j)) for j in re.findall('([\d.]+);([\d.]+)', string)]
it = iter(text)
output = { i : get_tuples_of_float(next(it)) for i in it if re.match('^[A-Z]', i)}
You can use re.findall:
import re
s = """
57.7934235704;24.3128625831 58.3834133979;24.42892785 58.2573745795;24.0611983579 58.6127534044;23.4265600929
"""
new_data = map(float, re.findall('[\d\.]+', s))
final_data = {new_data[i]:new_data[i+1] for i in range(0, len(new_data), 2)}
Output:
{58.6127534044: 23.4265600929, 58.2573745795: 24.0611983579, 58.3834133979: 24.42892785, 57.7934235704: 24.3128625831}
Why don't you first split each line of text based on spaces and then the array that comes out from it, you then split each individual coordinate pair based on the semicolons that are common to them then you can now add everything to the country key on the dictionary.
Consider the line below read in from a txt file:
EDIT: The text file has thousands of lines just like the one below: TAG1=1494947148,1,d,ble,0,2,0,0&TAG2[]=0,229109531800552&TAG2[]=0,22910953180055 ...
In the line there would be some data that corresponds to TAG1 and lots of data that have &TAG2 at their start.
I want to make a dictionary that has further dictionaries within it, like
{
{'TAG1':1494947148,1,d,ble,0,2,0,0}
{'TAG2:
{'1': 0, '2':229109531800552}
{'1': 0, '2':22910953180055}
}
.
.
}
How do I split the string starting at TAG1 and stopping just before the ampersand before TAG2? Does python allow some way to check if a certain character(s) has been encountered and stop/start there?
I would turn them into a dictionary of string key and list of values. It doesn't matter if a tag has one or more items, just lists would make parsing them simple. You can further process the result dictionary if you find that necessary.
The code will discard the [] in tag names, as they all turned to list anyway.
from itertools import groupby
from operator import itemgetter
import re
s = "TAG1=1494947148,1,d,ble,0,2,0,0&TAG2[]=0,229109531800552&TAG2[]=0,22910953180055"
splitted = map(re.compile("(?:\[\])?=").split, s.split("&"))
tag_values = groupby(sorted(splitted, key=itemgetter(0)), key=itemgetter(0))
result = {t: [c[1].split(',') for c in v] for t, v in tag_values}
And when you print the result, you get:
print(result)
{'TAG2': [['0', '229109531800552'], ['0', '22910953180055']], 'TAG1': [['1494947148', '1', 'd', 'ble', '0', '2', '0', '0']]}
How it works
splitted = map(re.compile("(?:\[\])?=").split, s.split("&"))
first you split the line with &. That will turn the line into little chunks like "TAG2[]=0,229109531800552", then map turns each chunk into two parts removing the = or []= between them.
tag_values = groupby(sorted(splitted, key=itemgetter(0)), key=itemgetter(0))
Because of the map function, splitted is now a iterable that will return lists of two items when consumed. We further sort then group them with the tag(the string on the left of =). Now we have tag_values with keys represent tags and each tag paired with all the matching values(including the tag). Still an iterable though, which means all the thing we talked about haven't really happend yet, except for s.split("&")
result = {t: [c[1].split(',') for c in v] for t, v in tag_values}
The last line uses both list and dictionary comprehension. We want to turn the result into a dict of tag and list of values. The curly brackets are dictionary comprehension. The inner variables t and v are extracted from tag_values where t is the tag and v is the grouped matching values(again tag included). At the beginning of the curly bracket t: means use t as a dictionary key, after the column would be the key's matching value.
We want to turn the dictionary value to a list of lists. The square brackets are the list comprehension that consumes the iterable v and turn it into a list. Variable c represent each item in v, and finally because c has two items, the tag and the string values, by using c[1].split(',') we take the value part and split it right into a list. And there is your result.
Further Reading
You really ought to get familiar with list/dict comprehension and generator expression, also take a look at yield if you want to get more things done with python, and learn itertools, functools, operator along the way. Basically just functional programming stuff, python is not a pure functional language though, these are just some powerful metaphors you can use. Read up on some functional languages like haskell that would also improve your python skills.
I think this might what you need:
import json
data = "TAG1=1494947148,1,d,ble,0,2,0,0&TAG2[]=0,229109531800552&TAG2[]=0,22910953180055"
items = data.split("&")
res ={}
for item in items:
key, value = item.split("=")
key = key.replace("[]","")
values = value.split(",")
if key in res:
res[key].append(values)
else:
res[key] = [values]
print(res)
print(json.dumps(res))
The results:
{'TAG1': [['1494947148', '1', 'd', 'ble', '0', '2', '0', '0']],
'TAG2': [['0', '229109531800552'], ['0', '22910953180055']]}
{"TAG1": [["1494947148", "1", "d", "ble", "0", "2", "0", "0"]],
"TAG2": [["0", "229109531800552"], ["0", "22910953180055"]]}
This may helps you
string = 'TAG1=1494947148,1,d,ble,0,2,0,0&TAG2[]=0,229109531800552'
data = map(str,string.split('&'))
print data
in_data_dic= {}
for i in data:
in_data = map(str,i.split('='))
in_data_dic[in_data[0]] = in_data[1]
in_data=[]
print in_data_dic
output
{'TAG2[]': '0,229109531800552', 'TAG1': '1494947148,1,d,ble,0,2,0,0'}
I have a list of lists that I would like to convert to a list of strings where the strings are the names of the variables. I would like to loop through the list and extract the length of the lists into one list and the name of the list into another. To illustrate here's an attempt to do it using str(variable), which obviously doesn't work because it converts the value of the variable, not the name. I'm trying to convert the name
# Here are the initial lists
common_nouns = ['noun', 'fact', 'banana']
verbs = ['run', 'jump']
adjectives = ['red', 'blue']
# Here is my list of lists:
parts_of_speech = [common_nouns, verbs, adjectives]
labels=[]
data=[]
for pos in parts_of_speech:
if pos != []:
labels.append(str(pos)) # This part doesn't work
data.append(len(pos))
result:
labels = ["['noun', 'fact', 'banana']", "['run', 'jump']", "['red', 'blue']"]
desired result:
labels = ['common_nouns', 'verbs', 'adjectives']
EDIT: Added initial lists
This is the opposite of the frequent question on how to have "variable variables". But the answer is exactly the same: don't do that, use a dict.
Store this data as a single dict with those values as the keys, then you can use the .keys() method to give you the result you want.
I ended up doing a dictionary as Daniel Roseman suggested. Here is the implementation:
parts_of_speech = [common_nouns, verbs, adjectives]
all_labels = ['common_nouns', 'verbs', 'adjectives']
pos_dict = dict(zip(all_labels, parts_of_speech))
labels=[]
data=[]
for pos, lst in pos_dict.items():
if lst:
data.append(len(lst))
labels.append(pos)
I am getting myself all tangled up where in the nesting.
I have a list of python objects that look like this:
notes = [
{'id':1,
'title':'title1',
'text':'bla1 bla1 bla1',
'tags':['tag1a', ' tag1b', ' tag1c']},
{'id':2,
'title':'title2',
'text':'bla2 bla2 bla2',
'tags':[' tag2a', ' tag2b', ' tag2c']},
{'id':3,
'title':'title3',
'text':'bla3 bla3 bla3',
'tags':[' tag3a', ' tag3b', ' tag3c']}]
and so on.
I am trying to go into each dictionary in the list and strip out the left whitespaces and return a list of dictionaries where the only difference are the tags have their uneccessary white space stripped.
The following code is what I am working with, but it is not right and I don't know what I am doing to get to the result i need.
notes_cleaned = []
for objs in notes:
for items in objs:
notes_cleaned.append({'text':n['text'], 'id':n['id'], 'tags':[z.lstrip(' ') for z in n['tags']], 'title':n['title']})
Which gives me an error that i can't use string indexes, which I understand, but I don't know how to do it right. since I know that I have to iterate over each dictionary like:
for objs in notes:
for items in objs:
print items, objs[items]
but I am confused as to how to get to the final part of rebuilding the dictionaries while digging into the tag lists specifically.
What am I missing here (knowing that I am definitely missing something).
I think this is enough:
for note in notes:
note['tags']= [t.strip() for t in note['tags']]
If you really need to operate on a copy (of notes), you can get it easily: copy= map(dict, notes)
python 3.2
# if you want the dict which value is list and string within the list stripped
[{i:[j.strip() for j in v] for i,v in k.items()if isinstance(v,list)} for k in notes]
# if you want the dict which value is list and those string within the list
stripped which has whitespace
[{i:[j.strip() for j in v if " " in j] for i,v in k.items()if isinstance(v,list)}
for k in n]
The following code should work, assuming only "tags" needs to be stripped:
def clean(items):
clean = []
for objs in items:
nObj = {}
for item, obj in objs.iteritems():
if item != "tags":
nObj[item] = obj
else:
nObj["tags"] = [n.lstrip() for n in obj]
clean.append(nObj)
return clean