Python splitting list by whitespace - python

I have a textfile with numbers in the following manner "12345679010111213"
I have constructed a script that reads the fille, appends the values to the list, using a variable called "numbersoflist" list1.append(numbersoflist)
But when I call the list1.split('') it still prints out the values as they appear in the textfile, without whitespaces. My goal is to have them look like "1 2 3 4 5 6 ..."

>>> s = '12345679010111213'
>>> list(s)
['1', '2', '3', '4', '5', '6', '7', '9', '0', '1', '0', '1', '1', '1', '2', '1', '3']
>>> ' '.join(list(s))
'1 2 3 4 5 6 7 9 0 1 0 1 1 1 2 1 3'
>>> ' '.join(s) # works since str is also an iterable
'1 2 3 4 5 6 7 9 0 1 0 1 1 1 2 1 3'

Related

is there a way to make columns without pandas?

I currently have a list of lists and I need to put them into columns but I can't use pandas to make the columns. I currently have a list of lists that looks like this:
list_a = [['face1', 'face2', 'object', 'scene'], ['1', '7', '6', '5'], ['4', '3', '2', '8'], ['1', '3', '2', '4'], ['1', '2', '3', '4']]
and I want it to come out in columns like this
face1 face2 object scene
1 4 1 1
7 3 3 2
6 2 2 3
5 8 4 4
You can use string formatting to print columns.
for row in list_a:
print(''.join(f'{x:^8}' for x in row)) # 8-character wide centered columns
# output
face1 face2 object scene
1 7 6 5
4 3 2 8
1 3 2 4
1 2 3 4
try using center:
list_a = [['face1', 'face2', 'object', 'scene'], ['1', '7', '6', '5'], ['4', '3', '2', '8'], ['1', '3', '2', '4'],
['1', '2', '3', '4']]
for item in list_a:
for subitem in item:
print(subitem.center(10), end='')
print()
output :
face1 face2 object scene
1 7 6 5
4 3 2 8
1 3 2 4
1 2 3 4
Note : If your list, contains a value other than string, don't forget to convert it to string before calling .center on it:
print(str(subitem).center(10), end='')

Counting list entries in specific columns in Pandas Dataframe

I have a Dataframe like this
1 2 ... 9
0 ['1'] [] ['9']
1 ['1'] ['2', '2', '2'] ['9', '9']
2 ['1', '1'] ['2', '2', '2'] []
3 ['1', '1', '1'] ['2'] []
I want to count the occurences in each column so that the output would be like this
1 2 ... 9
0 1 0 1
1 1 3 2
2 2 3 0
3 3 1 0
This seems to work with the following code
df.1.apply(lambda x: x.count('1'))
but how can I automate it for all my columns so I don't have to run the above code for each individual column?
In addition I used
df.1.apply(lambda x: x.count('1')).sum()
to count the total for the rows which seems to be giving the right answer. Is there a better way though?

How to sort the values in dataframe?

I am trying to sort the values but not getting the desirable result. Can you please help me how to do this?
Example:
df = pd.read_csv("D:/Users/SPate233/Downloads/iMedical/sqoop/New folder/metadata_file_imedical.txt", delimiter='~')
#df.sort_values(by = ['dependency'], inplace = True)
df.sort_values('dependency', ascending=True, inplace=True)
print(list(df['dependency'].unique()))
Output:
['0', '1', '1,10,11,26,28,55', '1,26,28', '10', '11', '12', '17,42', '2', '26,28', '33', '42', '6']
Desirable_output:
['0', '1', '2', '6', '10', '11', '12', '33', '42', '17,42', '26,28', '1,26,28', '1,10,11,26,28,55']
Order by the length of the string, and then by its value:
df.assign(len = df.dependency.str.len()).sort_values(["len", "dependency"])
The output is (leaving the len column in for clarity):
dependency len
0 0 1
1 1 1
8 2 1
12 6 1
4 10 2
5 11 2
6 12 2
10 33 2
11 42 2
7 17,42 5
9 26,28 5
3 1,26,28 7
2 1,10,11,26,28,55 16

Concatenate strings based on inner join

I have two DataFrames containing the same columns; an id, a date and a str:
df1 = pd.DataFrame({'id': ['1', '2', '3', '4', '10'],
'date': ['4', '5', '6', '7', '8'],
'str': ['a', 'b', 'c', 'd', 'e']})
df2 = pd.DataFrame({'id': ['1', '2', '3', '4', '12'],
'date': ['4', '5', '6', '7', '8'],
'str': ['A', 'B', 'C', 'D', 'Q']})
I would like to join these two datasets on the id and date columns, and create a resulting column that is the concatenation of str:
df3 = pd.DataFrame({'id': ['1', '2', '3', '4', '10', '12'],
'date': ['4', '5', '6', '7', '8', '8'],
'str': ['aA', 'bB', 'cC', 'dD', 'e', 'Q']})
I guess I can make an inner join and then concatenate the strings, but is there an easier way to achieve this?
IIUC concat+groupby
pd.concat([df1,df2]).groupby(['date','id']).str.sum().reset_index()
Out[9]:
date id str
0 4 1 aA
1 5 2 bB
2 6 3 cC
3 7 4 dD
4 8 10 e
5 8 12 Q
And if we consider the efficiency using sum() base on level
pd.concat([df1,df2]).set_index(['date','id']).sum(level=[0,1]).reset_index()
Out[12]:
date id str
0 4 1 aA
1 5 2 bB
2 6 3 cC
3 7 4 dD
4 8 10 e
5 8 12 Q
Using radd:
i = df1.set_index(['date', 'id'])
j = df2.set_index(['date', 'id'])
j['str'].radd(i['str'], fill_value='').reset_index()
date id str
0 4 1 aA
1 5 2 bB
2 6 3 cC
3 7 4 dD
4 8 10 e
5 8 12 Q
This should be pretty fast.

Converting a multiline string to a dataframe

I have the following string:
Hoy
1
5
14
3
0
23
and I would like to turn it into a df.
I think it would be a good idea to turn to a list(string) and then pd.Dataframe(list(string)), however when I turn to a list return the following output:
['\n', 'H', 'o', 'y', '\n', '1', '\n', '5', '\n', '1', '4', '\n', '3', '\n', '0', '\n', '2', '3', '\n', '2', ',', '8', '3', '*', '\n']
Is there an alternative way to turn the initial string into a df such like this?:
0
0 Hoy
1 1
2 5
3 14
4 3
5 0
6 23
Use pd.read_csv, passing an IO buffer to it:
from up import StringIO
text = '''Hoy
1
5
14
3
0
23
'''
pd.read_csv(StringIO(text), header=None)
0
0 Hoy
1 1
2 5
3 14
4 3
5 0
6 23
This should act as an argument for accepting #COLDSPEED's answer by observing how ugly this answer is.
txt = """Hoy
1
5
14
3
0
23"""
(lambda x: pd.Series(pd.to_numeric(x[1:], 'ignore'), name=x[0]))(
txt.split('\n')
).to_frame()
Hoy
0 1
1 5
2 14
3 3
4 0
5 23

Categories