Splitting .txt file in python [duplicate] - python

This question already has answers here:
Customizing the separator in pandas read_csv
(4 answers)
Closed 2 years ago.
I have a .txt file that is separated as follows for multiple rows:
Vermont;VT;Tunbridge;95000204;Republican;John Kasich;36;0.319
When read with pandas I only get 1 column.
How do I split the data in python so that each separated value is a different column in a pandas dataframe
Thanks

Like this (see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html for more)
import pandas as pd
df = pf.read_csv('data.csv',sep=';')
print(df)
where data.csv is
Vermont;VT;Tunbridge;95000204;Republican;John Kasich;36;0.319
NewYork;VT;Tunbridge;95000204;Republican;John Kasich;36;0.88
output
Vermont VT Tunbridge ... John Kasich 36 0.319
0 NewYork VT Tunbridge ... John Kasich 36 0.88
[1 rows x 8 columns]

Related

How to split a string column into multiple columns? [duplicate]

This question already has answers here:
Split / Explode a column of dictionaries into separate columns with pandas
(13 answers)
Closed 14 days ago.
I have a data frame with one string column and I'd like to split it into multiple columns by seperate with
','. I want to name the column as same as the string in the column before ':'.
The column looks like this:
0 {"ID":"AP001","Name":"Anderson","Age":"23"}
1 {"ID":"AP002","Name":"Jasmine","Age":"36"}
2 {"ID":"AP003","Name":"Zack","Age":"28"}
3 {"ID":"AP004","Name":"Chole","Age":"39"}
And I want to split to this:
ID
Name
Age
AP001
Anderson
23
AP002
Jasmine
36
AP003
Zack
28
AP004
Chole
39
I have tried to split it by ',', but im not sure how to remove the string before ':' and put it as the column name.
data1 = data['demographic'].str.split(',',expand=True)
This is what I get after splitting it:
0
1
2
"ID":"AP001"
"Name":"Anderson"
"Age":"23"
"ID":"AP002"
"Name":"Jasmine"
"Age":"36"
"ID":"AP003"
"Name":"Zack"
"Age":"28"
"ID":"AP004"
"Name":"Chole"
"Age":"39"
Anyone knows how to do it?
You can use ast.literal_eval:
import ast
data1 = pd.json_normalize(data['demographic'].apply(ast.literal_eval))
print(data1)
# Output
ID Name Age
0 AP001 Anderson 23
1 AP002 Jasmine 36
2 AP003 Zack 28
3 AP004 Chole 39

Pandas how to make a transpose of data-frame to get values for the remaining two columns [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 1 year ago.
Have a df with values
name marks subject
mark 50 math
mark 75 french
tom 25 english
tom 30 Art
luca 100 math
luca 100 art
How to make a transpose of a dataframe so it looks like this
name math art french english
mark 50 75
tom 30 25
luca 100 100
tried:
df.T and df[['marks','subject']].T
but
This is a pivot. First we need to normalize the subject column, then we pivot.
df['subject'] = df['subject'].str.lower()
df.pivot(index='name', columns='subject', values='marks')
See here for more info: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot

python pandas read dataframe and do not include index [duplicate]

This question already has answers here:
Pandas dataframe hide index functionality?
(8 answers)
Closed 3 years ago.
very simple question
I am reading an excel sheet with python and I want to print the results without the automatic index pandas adds
import pandas as pd
x=pd.read_excel(r'2_56_01.276295.xlsx',index_col=None)
print x[:3]
this prints the 1st 3 rows
blahblah Street Borough
0 55 W 192 ST Bronx
1 2514 EAST TREMONT AV Bronx
2 877 INTERVALE AV Bronx
but I do not want the index
print x.to_string(index=False)
should do the trick

Data Manipulation in Python [duplicate]

This question already has answers here:
Pandas groupby with delimiter join
(2 answers)
pandas groupby concatenate strings in multiple columns
(1 answer)
Closed 4 years ago.
I am dealing with a data set which has the following fields:
ID Person_Name Person_Country
110 Marc CA
110 Sean CN
111 Matt IN
111 Rob AU
112 Mike US
I intend grouping the data in the following way:
ID Person_Name Person_Country
110 Marc; Sean CA; CN
111 Matt; Rob IN; AU
112 Mike US
I tried using the built-in functions like .pivot_table() and .unstack(), but they weren't helpful since I am dealing with non-numeric data.

Reseting pandas row index to start at number other than 0? [duplicate]

This question already has answers here:
Increase index of pandas DataFrame by one
(2 answers)
Closed 6 months ago.
Currently I am trying to read in a .csv file and then use the to_html() to create a table with indexing on the side. All lines of code here:
import pandas as pd
df = pd.read_csv('file.csv')
df.to_html('example.html')
As expected I am currently getting:
Year Population Annual Growth Rate
0 1950 2557628654 1.458
1 1951 2594919657 1.611
2 1952 2636732631 1.717
3 1953 2681994386 1.796
4 1954 2730149884 1.899
However I want to start the indexing at 2 instead of 0. For example:
Year Population Annual Growth Rate
2 1950 2557628654 1.458
3 1951 2594919657 1.611
4 1952 2636732631 1.717
5 1953 2681994386 1.796
6 1954 2730149884 1.899
I know I could achieve this outcome by adding two dummy rows in the .csv file and then deleting them with df.ix[], but I do not want to do this.
Is there a way to change the indexing to start at something other than 0 without having to add or delete rows in the .csv file?
Thanks!
I know it looks like a hack, but what if just change index series. For example:
df.index = df.index + 2

Categories