reading python dataframe and one column is represent as string - python

let say I have df
df
bbox
[34,23,2,3]
then when I try
df['bbox'][0] = '[34,23,2,3]'
when I do list
list(df['bbox'][0]) = ['[',
'1',
'2',
'1',
'.',
'0',
',',
' ',
'2',
'0',
'4',
'.',
'0',
',',
' ',
'1',
'0',
'8',
'.',
'0',
',',
' ',
'1',
'4',
'7',
'.',
'0',
']']
How should I make it just normal list?

import ast
df['bbox'] = df['bbox'].apply(lambda x: ast.literal_eval(x))
Hope this is your requirement

There is a simpler version to solve this
eval(df['bbox'][0])
prints your list which is [121.0, 204.0, 108.0, 147.0]

Related

Python, If statement returning true for e==' ' sometimes even when e doesn't equal ' '

I wrote a function to separate a string of time and date into two strings, that string always comes with space between them so I thought I would append and pop the elements one by one until I reach the space.
This is the code:
def seprate_data_time(inlist):
timelist= list(inlist)
datelist=[]
#split Date and Time based on the space ' '
print('>>>>>>>>')
for e in timelist:
print(timelist)
if e==' ':
break
datelist.append(timelist[0])
timelist.pop(0)
return [''.join(datelist),''.join(timelist)]
which does work for like 10 loops then stops working like this:
#This is an example of the correct ones
'1', '/', '3', '1', '/', '2', '0', '2', '3', ' ', '6', ':', '5', '8', ':', '4', '1', ' ', 'P', 'M']
['/', '3', '1', '/', '2', '0', '2', '3', ' ', '6', ':', '5', '8', ':', '4', '1', ' ', 'P', 'M']
['3', '1', '/', '2', '0', '2', '3', ' ', '6', ':', '5', '8', ':', '4', '1', ' ', 'P', 'M']
['1', '/', '2', '0', '2', '3', ' ', '6', ':', '5', '8', ':', '4', '1', ' ', 'P', 'M']
['/', '2', '0', '2', '3', ' ', '6', ':', '5', '8', ':', '4', '1', ' ', 'P', 'M']
['2', '0', '2', '3', ' ', '6', ':', '5', '8', ':', '4', '1', ' ', 'P', 'M']
['0', '2', '3', ' ', '6', ':', '5', '8', ':', '4', '1', ' ', 'P', 'M']
['2', '3', ' ', '6', ':', '5', '8', ':', '4', '1', ' ', 'P', 'M']
['3', ' ', '6', ':', '5', '8', ':', '4', '1', ' ', 'P', 'M']
[' ', '6', ':', '5', '8', ':', '4', '1', ' ', 'P', 'M']
>>>>>>>>
#This is an example of the issue is stopping even before reaching the ' '
['2', '/', '1', '/', '2', '0', '2', '3', ' ', '1', '0', ':', '3', '0', ':', '2', '6', ' ', 'A', 'M']
['/', '1', '/', '2', '0', '2', '3', ' ', '1', '0', ':', '3', '0', ':', '2', '6', ' ', 'A', 'M']
['1', '/', '2', '0', '2', '3', ' ', '1', '0', ':', '3', '0', ':', '2', '6', ' ', 'A', 'M']
['/', '2', '0', '2', '3', ' ', '1', '0', ':', '3', '0', ':', '2', '6', ' ', 'A', 'M']
['2', '0', '2', '3', ' ', '1', '0', ':', '3', '0', ':', '2', '6', ' ', 'A', 'M']
>>>>>>>>
I tried a few things like switching the ' ' to 'S' but no luck.
Also tried putting the popping and appending into else also didn't work

String to array conversion in python [duplicate]

This question already has answers here:
Convert string representation of list into list
(1 answer)
How to convert string representation of list to a list
(19 answers)
Closed 1 year ago.
I have a string in my python like:
str = "[3705049, 3705078, 3705082, 3705086, 3705093, 3705096]"
Now I need to convert it to an array or list like:
arr = [3705049, 3705078, 3705082, 3705086, 3705093, 3705096]
I have tried like this:
str = "[3705049, 3705078, 3705082, 3705086, 3705093, 3705096]"
arr = list(str)
print(arr)
But it provides output like this:
['[', '3', '7', '0', '5', '0', '4', '9', ',', ' ', '3', '7', '0', '5', '0', '7', '8', ',', ' ', '3', '7', '0', '5', '0', '8', '2', ',', ' ', '3', '7', '0', '5', '0', '8', '6', ',', ' ', '3', '7', '0', '5', '0', '9', '3', ',', ' ', '3', '7', '0', '5', '0', '9', '6', ']']
Please suggest how can I fix this?
You can use json.loads to load your datas:
import json
string = "[3705049, 3705078, 3705082, 3705086, 3705093, 3705096]"
arr = [3705049, 3705078, 3705082, 3705086, 3705093, 3705096]
print(json.loads(string) == arr)

Grouping Lists into specific groups

I'm wondering if it is possible to convert the listings into a specific groups to which I could place them in a table format later on.
This is the output that I needed to group, I converted them into a list so that I could easily divide them in table manner.
f=open("sample1.txt", "r")
f.read()
Here's the output:
'0245984300999992018010100004+14650+121050FM-12+004699999V0203001N00101090001CN008000199+02141+01971101171ADDAY141021AY241021GA1021+006001081GA2061+090001021GE19MSL +99999+99999GF106991021999006001999999KA1120N+02111MD1210141+9999MW1051REMSYN10498430 31558 63001 10214 20197 40117 52014 70544 82108 333 20211 55062 56999 59012 82820 86280 555 60973=\n'
Here's what I have done already. I have managed to change it into a list which resulted in this output:
with open('sample1.txt', 'r') as file:
data = file.read().replace('\n', '')
print (list(data))
The Output:
['0', '2', '4', '5', '9', '8', '4', '3', '0', '0', '9', '9', '9', '9', '9', '2', '0', '1', '8', '0', '1', '0', '1', '0', '0', '0', '0', '4', '+', '1', '4', '6', '5', '0', '+', '1', '2', '1', '0', '5', '0', 'F', 'M', '-', '1', '2', '+', '0', '0', '4', '6', '9', '9', '9', '9', '9', 'V', '0', '2', '0', '3', '0', '0', '1', 'N', '0', '0', '1', '0', '1', '0', '9', '0', '0', '0', '1', 'C', 'N', '0', '0', '8', '0', '0', '0', '1', '9', '9', '+', '0', '2', '1', '4', '1', '+', '0', '1', '9', '7', '1', '1', '0', '1', '1', '7', '1', 'A', 'D', 'D', 'A', 'Y', '1', '4', '1', '0', '2', '1', 'A', 'Y', '2', '4', '1', '0', '2', '1', 'G', 'A', '1', '0', '2', '1', '+', '0', '0', '6', '0', '0', '1', '0', '8', '1', 'G', 'A', '2', '0', '6', '1', '+', '0', '9', '0', '0', '0', '1', '0', '2', '1', 'G', 'E', '1', '9', 'M', 'S', 'L', ' ', ' ', ' ', '+', '9', '9', '9', '9', '9', '+', '9', '9', '9', '9', '9', 'G', 'F', '1', '0', '6', '9', '9', '1', '0', '2', '1', '9', '9', '9', '0', '0', '6', '0', '0', '1', '9', '9', '9', '9', '9', '9', 'K', 'A', '1', '1', '2', '0', 'N', '+', '0', '2', '1', '1', '1', 'M', 'D', '1', '2', '1', '0', '1', '4', '1', '+', '9', '9', '9', '9', 'M', 'W', '1', '0', '5', '1', 'R', 'E', 'M', 'S', 'Y', 'N', '1', '0', '4', '9', '8', '4', '3', '0', ' ', '3', '1', '5', '5', '8', ' ', '6', '3', '0', '0', '1', ' ', '1', '0', '2', '1', '4', ' ', '2', '0', '1', '9', '7', ' ', '4', '0', '1', '1', '7', ' ', '5', '2', '0', '1', '4', ' ', '7', '0', '5', '4', '4', ' ', '8', '2', '1', '0', '8', ' ', '3', '3', '3', ' ', '2', '0', '2', '1', '1', ' ', '5', '5', '0', '6', '2', ' ', '5', '6', '9', '9', '9', ' ', '5', '9', '0', '1', '2', ' ', '8', '2', '8', '2', '0', ' ', '8', '6', '2', '8', '0', ' ', '5', '5', '5', ' ', '6', '0', '9', '7', '3', '=']
My goal is to group them into something like these:
0245,984300,99999,2018,01,01,0000,4,+1....
The number of digits belonging to each column is predetermined, for example there are always 4 digits for the first column and 6 for the second, and so on.
I was thinking of concatenating them. But I'm not sure if it would be possible.
You can use operator.itemgetter
from operator import itemgetter
g = itemgetter(slice(0, 4), slice(4, 10))
with open('sample1.txt') as file:
for line in file:
print(g(line))
Or even better you can make the slices dynamically using zip and itertools.accumulate:
indexes = [4, 6, ...]
g = itemgetter(*map(slice, *map(accumulate, zip([0]+indexes, indexes))))
Then proceed as before
I would recommend naming everything if you actually want to use this data, and double checking that all the lengths make sense. So to start you do
with open('sample1.txt', 'r') as file:
data = file.read().rstrip('\n"')
first, second, *rest = data.split()
if len(first) != 163:
raise ValueError(f"The first part should be 163 characters long, but it's {len(first)}")
if len(second) != 163:
raise ValueError(f"The second part should be characters long, but it's {len(first)}")
So now you have 3 variables
first is "0245984300999992018010100004+14650+121050FM-12+004699999V0203001N00101090001CN008000199+02141+01971101171ADDAY141021AY241021GA1021+006001081GA2061+090001021GE19MSL"
second is "+99999+99999GF106991021999006001999999KA1120N+02111MD1210141+9999MW1051REMSYN10498430"
rest is ['31558', '63001', '10214', '20197', '40117', '52014', '70544', '82108', '333', '20211', '55062', '56999', '59012', '82820', '86280', '555', '60973']
And then repeat that idea
date, whatever, whatever2, whatever3 = first.split('+')
and then for parsing the first part I would just have a list like
something = date[0:4]
something_else = date[4:10]
third_thing = date[10:15]
year = [15:19]
month = [19:21]
day = [21:23]
and so on. And then you can use all these variables in the code that analyzes them.
If this is some sort of standard, you should look for a library that parses strings like that or write one yourself.
Obviously name the variables better

Replace numbers after a character using re.sub() on a list

I have a list of numbers stored in a list and I want to delete the all the numbers after .
The list is,
['08-52-05.173735', '09-01-22.68835', '09-10-34.145061',] and I want to delete everything after ..Below is the code I am using,
ignore_ms = [re.sub(r'(?<=\.).*$', ' ', y) for y in timestamp]
print (ignore_ms)
where timestamp is the list above. However, the result I get is,
['[', "'", '0', '8', '-', '5', '2', '-', '0', '5', '. ', '1', '7', '3', '7', '3', '5', "'", ',', ' ', "'", '0', '9', '-', '0', '1', '-', '2', '2', '. ', '6', '8', '8', '3', '5', "'", ',', ' ', "'", '0', '9', '-', '1', '0', '-', '3', '4', '. ', '1', '4', '5', '0', '6', '1', "'", ',', ' ',
Whereas the result I want is 08-52-05., 09-01-22., 09-10-34.
Any idea what is wrong with the code above?
Thanks.
Try using str.split:
l = ['08-52-05.173735', '09-01-22.68835', '09-10-34.145061']
print([i.split('.')[0]+'.' for i in l])
You could instead go with a simpler approach avoiding regular expressions with:
l = ['08-52-05.173735', '09-01-22.68835', '09-10-34.145061',]
[i[:i.rfind('.')] for i in l]
# ['08-52-05', '09-01-22', '09-10-34']

All string list to a numpy float array

I am reading a csv file from pandas where I have a column of (3,3) shaped lists.
An example list is as follows.
[[45.70345721, -0.00014686, -1.679e-05], [-0.00012219, 45.70271889, 0.00012527], [-1.161e-05, 0.00013083, 45.70306778]]
I tried to convert this list to a numpy float array with np.array(arr).astype(np.float). But it gives the following error.
ValueError: could not convert string to float:
When I searched for the root cause I observed that this list is in fully string format. print [i for i in arr] gives the following where everything is a string.
['[', '[', '4', '5', '.', '7', '0', '3', '4', '5', '7', '2', '1', ',', ' ', '-', '0', '.', '0', '0', '0', '1', '4', '6', '8', '6', ',', ' ', '-', '1', '.', '6', '7', '9', 'e', '-', '0', '5', ']', ',', ' ', '[', '-', '0', '.', '0', '0', '0', '1', '2', '2', '1', '9', ',', ' ', '4', '5', '.', '7', '0', '2', '7', '1', '8', '8', '9', ',', ' ', '0', '.', '0', '0', '0', '1', '2', '5', '2', '7', ']', ',', ' ', '[', '-', '1', '.', '1', '6', '1', 'e', '-', '0', '5', ',', ' ', '0', '.', '0', '0', '0', '1', '3', '0', '8', '3', ',', ' ', '4', '5', '.', '7', '0', '3', '0', '6', '7', '7', '8', ']', ']']
How do I convert this list to a numpy float array?
EDIT
Here is a snap of a part of my data frame.
When loaded, the data frame is in the below format. df here is a small example data frame.
df = pd.DataFrame(columns=["e_total"], data=[[['[', '[', '4', '5', '.', '7', '0', '3', '4', '5', '7', '2', '1', ',', ' ', '-', '0', '.', '0', '0', '0', '1', '4', '6', '8', '6', ',', ' ', '-', '1', '.', '6', '7', '9', 'e', '-', '0', '5', ']', ',', ' ', '[', '-', '0', '.', '0', '0', '0', '1', '2', '2', '1', '9', ',', ' ', '4', '5', '.', '7', '0', '2', '7', '1', '8', '8', '9', ',', ' ', '0', '.', '0', '0', '0', '1', '2', '5', '2', '7', ']', ',', ' ', '[', '-', '1', '.', '1', '6', '1', 'e', '-', '0', '5', ',', ' ', '0', '.', '0', '0', '0', '1', '3', '0', '8', '3', ',', ' ', '4', '5', '.', '7', '0', '3', '0', '6', '7', '7', '8', ']', ']']]])
Could someone give it a try and help me to convert this to a float array.
You can probably use eval() to turn the entire string into an actual list. eval() is generally not good to use, but in this case it might be your best bet.
What you listed as your "example" is not correct. You are listing the result of your print statement and list comprehension. What is being stored as an entry for that column is a string.
you should be able to simply take each item and wrap it in eval
eval(arr)
that should return you a shape (3,3) python list. From there you can convert it to a numpy array as necessary and change the types.
Aren't the numbers in the lists already floats? If that is the case just making the list an np.array will do what you are asking. You only need to do
np.array(list)
if the numbers are actually strings like you are showing in the second part you will have to go through the list and convert each number individually using either a nest loop or nested list comprehension.
the loop looks like this
for i in list:
for j in i:
j= np.float(j)
the list comprehension looks like
new_list= [ [np.float(j) for j in i] for i in list]

Categories