I am relatively new to pandas and now trying to convert pandas DataFrame rows to lists of strings.
It works well, however the strings in the original DataFrame are strangely modified in the list, as some append an "L" character for some reason.
I appreciate your help very much..
>>data=pd.DataFrame(Data)
>>for r in data.iterrows():
>> r[1].tolist()
>>r[1]
a 16593
b 15
c 179.069
d 110000
e 5906
Name: 0, dtype: object
>>r[1].tolist()
[16593L, 15.0, 179.068851, 110000.0, 5906L]
In fact I figured out, that the numbers that append an L are integers, for floats it works..
Every column in the DataFrame has a specific "type" associated with it.
Typically this usually means they are of type "string", "int", or "float".
Right now, your .tolist() call converts the row into a list, but it doesn't necessarily change the type of all the values into a string.
When you type a list into the console, Python uses the "repr" method to find a string representation of the list. This involves putting in the brackets and calling "repr" on each of the elements. This is slightly different than casting the value to a string, which is done with the "str" method.
You can test this out for yourself:
# For regular ints, repr and str do the same thing
a = 5
str(a) #'5'
repr(a) #'5'
# The L means it's a *long*, basically an int with a higher max-value
a = 5L
str(a) #'5'
repr(a) #'5L'
*Note, this isn't the case in Python 3 all ints are automatically 'long', resulting in no L as it would be redundant.
So, in the end, if you really want to convert your list of various types (float, int, str, depending on each column) to strings, you could use something like this:
my_list = [str(x) for x in my_list]
However, if you plan on doing some processing using these numbers, it's better to just leave them as their numerical type rather than convert back and forth to string.
Related
python version: 3.10.2
It was of my understanding that list[str|int] should allow three distinct types of values:
List of only strings (list[str])
List of only ints (list[int])
List of both strings and ints (list[str|int])
Is this assumption right, or am I understanding it wrong ? because that's not what happens when I'm coding:
def test(x: list[str|int]):
pass
# All fine
test([]); test(['a']); test([0]); test(['a', 0])
x: list[str] = []
y: list[int] = []
z: list[str|int] = []
test(x); test(y); test(z)
It accepts lists with only strings or only ints, but only if I don't specify that the list in question only accepts one of these values. x and y are explicitly declared as lists that only accepts one of this values (int or str), and because of that they are rejected. Is this really what is supposed to happen, or is it a bug ?
If I change the function annotation to accept a Sequence[str|int] instead of a list[str|int], then everything is fine. And same for Container[str|int], or even tuple[str|int, ...]
[
Here's the problem:
The type checker doesn't know or care what test does with the list. It only cares about what it might do to the list. One thing that would be legal is to add a str to the list: you said that (the parameter) x has type list[str|int]. That's fine if the list passed to test has type list[str|int] or list[str]; it's not fine if the list has type list[int].
That is, test does not accept lists that have either str or int values: it accepts lists that must be able to contain str or int values.
The problem goes away when you change the type of the parameter to Sequence[str|int], because a Sequence does not permit mutation. The type guarantees that if you provide a list[int] value, test won't try to add a str to the list.
For example, is it possible to convert the input
x = 10hr
into something like
y = 10
z = hr
I considering slicing, but the individual parts of the string will never be of a fixed length -- for example, the base string could also be something like 365d or 9minutes.
I'm aware of split() and re.match which can separate items into a list/group based on delimitation. But I'm curious what the shortest way to split a string containing a string and an integer into two separate variables is, without having to reassign the elements of the list.
You could use list comprehension and join it as a string
x='10hr'
digits="".join([i for i in x if not i.isalpha()])
letters="".join([i for i in x if i.isalpha()])
You don't need some fancy function or regex for your use case
x = '10hr'
i=0
while x[i].isdigit():
i+=1
The solution assumes that the string is going to be in format you have mentioned: 10hr, 365d, 9minutes, etc..
Above loop will get you the first index value i for the string part
>>i
2
>>x[:i]
'10'
>>x[i:]
'hr'
I have the following array:
a =['1','2']
I want to convert this array into the below format :
a=[1,2]
How can I do that?
You can do it like that. You change each element of a (which are strings) in an integer.
a=[int(x) for x in a]
This single inverted comma you are talking about is the difference between str and int. This is pretty basic python stuff.
A string is a characters, displayed with the inverted comma's around it. 'Hello' is a string, but '1' can be a string too.
In you case ['1','2'] is a list of strings, and [1,2] is a list of numbers.
To convert a string to an int, you can do what is called casting. This is converting one type to another (They have to be compatible though.) Casting 'hello' to a number doesn't make sense and won't work.
Casting '1' to a number is possible by calling int('1') which will result in 1
In your case you can cast all elements in you list by calling a = [int(x) for x in a].
For more info on types see this article.
For information on list comprehensions (What I used to change your list) see this article.
I am trying to find the sum of all numbers in a list but every time I try I get an error that it cannot convert the string to float. Here is what I have so far.
loop = True
float('elec_used')
while (loop):
totalelec = sum('elec_used')
print (totalelec)
loop = False
You need none of your code above. The while loop is unnecessary and it looks like its just exiting the loop in one iteration i.e. its not used correctly. If you're simply summing all the values in the list:
sum([float(i) for i in elec_used])
If this produces errors, please post your elec_used list. It probably contains string values or blank spaces.
'elec_used' is of type string of characters. You can not convert characters to the float. I am not sure why you thought you could do it. However you can convert the numeric string to float by typecasting it. For example:
>>> number_string = '123.5'
>>> float(number_string)
123.5
Now coming to your second part, for calculating the sum of number. Let say your are having the string of multiple numbers. Firstly .split() the list, type-cast each item to float and then calculate the sum(). For example:
>>> number_string = '123.5 345.7 789.4'
>>> splitted_num_string = number_string.split()
>>> number_list = [float(num) for num in splitted_num_string]
>>> sum(number_list)
1258.6
Which could be written in one line using list comprehension as:
>>> sum(float(item) for item in number_string.split())
1258.6
OR, using map() as:
>>> sum(map(float, number_string.split()))
1258.6
I have a parsing system for fixed-length text records based on a layout table:
parse_table = [\
('name', type, length),
....
('numeric_field', int, 10), # int example
('textc_field', str, 100), # string example
...
]
The idea is that given a table for a message type, I just go through the string, and reconstruct a dictionary out of it, according to entries in the table.
Now, I can handle strings and proper integers, but int() will not parse all-spaces fields (for a good reason, of course).
I wanted to handle it by defining a subclass of int that handles blank strings. This way I could go and change the type of appropriate table entries without introducing additional kludges in the parsing code (like filters), and it would "just work".
But I can't figure out how to override the constructor of a build-in type in a sub-type, as defining constructor in the subclass does not seem to help. I feel I'm missing something fundamental here about how Python built-in types work.
How should I approach this? I'm also open to alternatives that don't add too much complexity.
Use int() function with the argument s.strip() or 0, i.e:
int(s.strip() or 0)
Or if you know that the string will always contain only digit characters or is empty (""), then just:
int(s or 0)
In your specific case you can use lambda expression, e.g:
parse_table = [\
....
('numeric_field', lambda s: int(s.strip() or 0), 10), # int example
...
]
Use a factory function instead of int or a subclass of int:
def mk_int(s):
s = s.strip()
return int(s) if s else 0
lenient_int = lambda string: int(string) if string.strip() else None
#else 0
#else ???
note that mylist is a list that contain:
Tuples, and inside tuples, there are
I) null / empty values,
ii) digits, numbers as strings, as well
iii) empty / null lists. for example:
mylist=[('','1',[]),('',[],2)]
#Arlaharen I am repeating here, your solution, somewhat differently, in order to add keywords, because, i lost a lot of time, in order to find it!
The following solution is stripping / converting null strings, empty strings, or otherwise, empty lists, as zero, BUT keeping non empty strings, non empty lists, that include digits / numbers as strings, and then it convert these strings, as numbers / digits.
Simple solution. Note that "0" can be replaced by iterable variables.
Note the first solution cannot TREAT empty lists inside tuples.
int(mylist[0][0]) if mylist[0][0].strip() else 0
I found even more simpler way, that IT can treat empty lists in a tuple
int(mylist[0][0] or '0')
convert string to digits / convert string to number / convert string to integer
strip empty lists / strip empty string / treat empty string as digit / number
convert null string as digit / number / convert null string as integer