How to read file in pandas with unfix whitespace/s separation?

How to read file in pandas with unfix whitespace/s separation? - python

I have a textfile that contains 2 columns of data. They are separated with unfix number of whitespace/s. I want to load it on a pandas DataFrame.
Example:
306.000000 1.125783
307.000000 0.008101
308.000000 -0.005917
309.000000 0.003784
310.000000 -0.516513
Please note that it also starts with whitespace/s.
My desired output would be like:
output = {'Wavelength': [306.000000, 307.000000, 308.000000, 309.000000, 310.000000],
'Reflectance': [1.125783, 0.008101, -0.005917, 0.003784, -0.516513]}
df = pd.DataFrame(data=output)

Use read_csv:
df = pd.read_csv('file.txt', sep='\\s+', names=['Wavelength', 'Reflectance'], header=None)

Related

How do I capture the properties I want from a string?

I hope you are well I have the following string:
"{\"code\":0,\"description\":\"Done\",\"response\":{\"id\":\"8-717-2346\",\"idType\":\"CIP\",\"suscriptionId\":\"92118213\"},....\"childProducts\":[]}}"...
To which I'm trying to capture the attributes: id, idType and subscriptionId and map them as a dataframe, but the entire body of the .cvs puts it in a single row so it is almost impossible for me to work without index
desired output:
id, idType, suscriptionID
0. '7-84-1811', 'CIP', 21312421412
1. '1-232-42', 'IO' , 21421e324
My code:
import pandas as pd
import json
path = '/example.csv'
df = pd.read_csv(path)
normalize_df = json.load(df)
print(df)

Considering your string is in JSON format, you can do this.
drop columns, transpose, and get headers right.
toEscape = "{\"code\":0,\"description\":\"Done\",\"response\":{\"id\":\"8-717-2346\",\"idType\":\"CIP\",\"suscriptionId\":\"92118213\"}}"
json_string = toEscape.encode('utf-8').decode('unicode_escape')
df = pd.read_json(json_string)
df = df.drop(["code","description"], axis=1)
df = df.transpose().reset_index().drop("index", axis=1)
df.to_csv("user_details.csv")
the output looks like this:
id idType suscriptionId
0 8-717-2346 CIP 92118213
Thank you for the question.

Is there any method to replace specific data from column without breaking its structure or spliting

Hi there i am trying to figure out how to replace a specific data of csv file. i have a file which is base or location data of id's.
https://store8.gofile.io/download/5b031959-e0b0-4dbf-aec6-264e0b87fd09/service%20block.xlsx (sheet 2 had data ).
The file which i want to replace data using id is below
https://store8.gofile.io/download/6e13a19a-bac8-4d16-8692-e4435eed2a08/Serp.csv
Highlighted part need to be deleted after filling location.
import pandas as pd
df1= pd.read_excel("serp.xlsx", header=None)
df2= pd.read_excel("flocnam.xlsx", header=None)
df1 = df1[0].str.split(";", expand=True)
df1[4] = df1[4].apply(lambda x: v[-1] if (v := x.split()) else "")
df2[1] = df2[1].apply(lambda x: x.split("-")[0])
m = dict(zip(df2[1], df2[0]))
df1[4]= df1[4].replace(m)
print(df1)
df1.to_csv ("test.csv")
It worked but not how i wanted.
https://store8.gofile.io/download/c0ae7e05-c0e2-4f43-9d13-da12ddf73a8d/test.csv
trying to replace it like this.(desired output)
Thank you for being Supportive community❤️

If I understand correctly, you simply need to specify the separator ;
>>> df.to_csv(‘test.csv’, sep=‘;’, index_label=False)

Parsing data in Excel using python

In Excel, I have to separate the following value from one cell into two:
2016-12-12 (r=0.1)
2016-12-13* (r=0.7)
How do I do that in Python so that in the Excel file, dates and "r=#" will be in different cells? And also, is there a way to automatically remove the "*" sign?

This task is pretty straight forward if you use pandas:
Build a test file:
import pandas as pd
df_out = pd.DataFrame(
['2016-12-12 (r=0.1)', '2016-12-13* (r=0.7)'], columns=['data'])
df_out.to_excel('test.xlsx')
Code to convert string:
def convert_date(row):
return pd.Series([c.strip('*').strip('(').strip(')')
for c in row.split()])
Test code:
# read in test file
df_in = pd.read_excel('test.xlsx')
print(df_in)
# build a new dataframe
df_new = df_in['data'].apply(convert_date)
df_new.columns = ['date', 'r']
print(df_new)
# save the dataframe
df_new.to_excel('test2.xlsx')
Results:
data
0 2016-12-12 (r=0.1)
1 2016-12-13* (r=0.7)
date r
0 2016-12-12 r=0.1
1 2016-12-13 r=0.7

how to convert link list to matrix in python

my input data looks like(input.txt)：
AGAP2 TCGA-BL-A0C8-01A-11R-A10U-07 66.7328
AGAP2 TCGA-BL-A13I-01A-11R-A13Y-07 186.8366
AGAP3 TCGA-BL-A13J-01A-11R-A10U-07 183.3767
AGAP3 TCGA-BL-A3JM-01A-12R-A21D-07 33.2927
AGAP3 TCGA-BT-A0S7-01A-11R-A10U-07 57.9040
AGAP3 TCGA-BT-A0YX-01A-11R-A10U-07 99.8540
AGAP4 TCGA-BT-A20J-01A-11R-A14Y-07 88.8278
AGAP4 TCGA-BT-A20N-01A-11R-A14Y-07 129.7021
i want the output.txt looks like :
TCGA-BL-A0C8-01A-11R-A10U-07 TCGA-BL-A13I-01A-11R-A13Y-07 ...
AGAP2 66.7328 186.8366
AGAP3 0 0

Using pandas: read csv, create pivot and write csv.
import pandas as pd
df = pd.read_table("input.txt", names="xy", sep=r'\s+')
# reset index first - we need named column
new = df.reset_index().pivot(index="index", columns='x', values='y')
new.fillna(0, inplace=True)
new.to_csv("output.csv", sep='\t') # tab separated
Reshaping and Pivot Tables
EDIT: filling empty values

How can I add columns in a data frame?

I have the following data:
Example:
DRIVER_ID;TIMESTAMP;POSITION
156;2014-02-01 00:00:00.739166+01;POINT(41.8836718276551 12.4877775603346)
I want to create a pandas dataframe with 4 columns that are the id, time, longitude, latitude.
So far, I got:
cur_cab = pd.DataFrame.from_csv(
path,
sep=";",
header=None,
parse_dates=[1]).reset_index()
cur_cab.columns = ['cab_id', 'datetime', 'point']
path specifies the .txt file containing the data.
I already wrote a function that returns the longitude and latitude values from the point formated string.
How do I expand the data frame with the additional column and the splitted values ?

After loading, if you're using a recent version of pandas then you can use the vectorised str methods to parse the column:
In [87]:
df['pos_x'], df['pos_y']= df['point'].str[6:-1].str.split(expand=True)
df
Out[87]:
cab_id datetime \
0 156 2014-01-31 23:00:00.739166
point pos_x pos_y
0 POINT(41.8836718276551 12.4877775603346) 0 1
Also you should stop using from_csv it's no longer updated, use the top level read_csv so your loading code would be:
cur_cab = pd.read_csv(
path,
sep=";",
header=None,
parse_dates=[1],
names=['cab_id', 'datetime', 'point'],
skiprows=1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read file in pandas with unfix whitespace/s separation? - python

Use read_csv: df = pd.read_csv('file.txt', sep='\\s+', names=['Wavelength', 'Reflectance'], header=None)

Related

How do I capture the properties I want from a string?

Is there any method to replace specific data from column without breaking its structure or spliting

Parsing data in Excel using python

how to convert link list to matrix in python

How can I add columns in a data frame?

Categories

Resources