I have a string which is -
str="Key=xxxx, age=11, key=yyyy , age=22,Key=zzzz, age=01, key=qqqq, age=21,Key=wwwww, age=91, key=pppp, age=22"
I want to convert this string to Python DataFrame with KEY and AGE as Column names.
The given key and age are in pair.
How could I achieve this conversion?
You can try regex
import re
import pandas as pd
s = "Key=xxxx, age=11, key=yyyy , age=22,Key=zzzz, age=01, key=qqqq, age=21,Key=wwwww, age=91, key=pppp, age=22"
df = pd.DataFrame(zip(re.findall(r'Key=([^,\s]+)', s, re.IGNORECASE), re.findall(r'age=([^,\s]+)', s, re.IGNORECASE)),
columns=['key', 'age'])
df
key age
0 xxxx 11
1 yyyy 22
2 zzzz 01
3 qqqq 21
4 wwwww 91
5 pppp 22
Use a regex that find all pairs of key/age : "key=(\w+)\s*,\s*age=(\w+)" then use them to build the dataframe
import re
import pandas as pd
content = "Key=xxxx, age=11, key=yyyy , age=22,Key=zzzz, age=01, key=qqqq, age=21,Key=wwwww, age=91, key=pppp, age=22"
pat = re.compile(r"key=(\w+)\s*,\s*age=(\w+)", flags=re.IGNORECASE)
values = pat.findall(content)
df = pd.DataFrame(values, columns=['key', 'age'])
print(df)
# - - - - -
key age
0 xxxx 11
1 yyyy 22
2 zzzz 01
3 qqqq 21
4 wwwww 91
5 pppp 22
Related
Hello this is my csv data
Age Name
0 22 George
1 33 lucas
2 22 Nick
3 12 Leo
4 32 Adriano
5 53 Bram
6 11 David
7 32 Andrei
8 22 Sergio
i want to use if else statement , for example if George is adult create new row and insert +
i mean
Age Name Adul
22 George +
What is best way?
This is my code Which i am using to read data from csv
import pandas as pd
produtos = pd.read_csv('User.csv', nrows=9)
print(produtos)
for i, produto in produtos.iterrows():
print(i,produto['Age'],produto['Name'])
IIUC, you want to create a new column (not row) call "Adul". You can do this with numpy.where:
import numpy as np
produtos["Adul"] = np.where(produtos["Age"].ge(18), "+", np.nan)
Edit:
To only do this for a specific name, you could use:
name = input("Name")
if name in produtos["Name"].tolist():
if produtos.loc[produtos["Name"]==name, "Age"] >= 18:
produtos.loc[produtos["Name"]==name, "Adul"] = "+"
You can do this:
produtos["Adul"] = np.where(produtos["Age"] >= 18, "+", np.nan)
It is possibly done with regular expressions, which I am not very strong at.
My dataframe is like this:
import pandas as pd
import regex as re
data = {'postcode': ['DG14','EC3M','BN45','M2','WC2A','W1C','PE35'], 'total':[44, 54,56, 78,87,35,36]}
df = pd.DataFrame(data)
df
postcode total
0 DG14 44
1 EC3M 54
2 BN45 56
3 M2 78
4 WC2A 87
5 W1C 35
6 PE35 36
I want to get these strings in my column with the last letter stripped like so:
postcode total
0 DG14 44
1 EC3 54
2 BN45 56
3 M2 78
4 WC2 87
5 W1C 35
6 PE35 36
Probably something using re.sub('', '\D')?
Thank you.
You could use str.replace here:
df["postcode"] = df["postcode"].str.replace(r'[A-Za-z]$', '')
One of the approaches:
import pandas as pd
import re
data = {'postcode': ['DG14','EC3M','BN45','M2','WC2A','W1C','PE35'], 'total':[44, 54,56, 78,87,35,36]}
data['postcode'] = [re.sub(r'[a-zA-Z]$', '', item) for item in data['postcode']]
df = pd.DataFrame(data)
print(df)
Output:
postcode total
0 DG14 44
1 EC3 54
2 BN45 56
3 M2 78
4 WC2 87
5 W1 35
6 PE35 36
sample file:
03|02|2|02|F|3|47|P| |AG|AFL|24|20201016| 1 |West |CH|India - LA |CNDO
code:
df1 = pd.read_csv("GM3.txt",sep="|",dtype=object)
df1.to_csv('file_validation.csv',index=None)
output in csv:
3 2 2 2 F 3 47 P AG AFL 24 20201016 1 West CH India - LA CNDO 302
when I am trying to print df1.to_csv() it is giving me below output:
0 03 02 2 CH India - LA CNDO
I want csv to be stored as string format i.e. 03,02 instead of integer.
Your code works for me:
import pandas as pd
df1 = pd.read_csv("GM3.txt",sep="|",dtype=object)
df1.to_csv('file_validation.csv',index=None)
produces
I was wondering if anyone had any suggestions on how to do the following:
I have multiple files: R1.csv, R2.csv and R3.csv
Each file has the following content in the same format:
For example:
R1.csv:
data_label pt1 pt2
DATA00_A1 1 2
DATA01_A1 11 22
DATA02_A1 111 222
R2.csv:
data_label pt1 pt2
DATA00_A2 1 2
DATA01_A2 11 22
DATA02_A2 111 222
So far to access these files and retrieve the data I have been using pandas:
import pandas as pd
dfObject=pd.read_csv('R1.csv',delimiter=' ')
labels=dfObject.data_label
datax=dfObject.pt1
datay=dfObject.pt2
But now I need to have all the data in one file. For example:
Rall.csv:
data_label pt1 pt2
DATA00_A1 1 2
DATA01_A1 11 22
DATA02_A1 111 222
DATA00_A2 1 2
DATA01_A2 11 22
DATA02_A2 111 222
I am not sure how to begin, so I would appreciate your suggestions, thanks!
Try this:
import pandas as pd
R1=pd.read_csv('R1.csv',delimiter=' ')
R2=pd.read_csv('R1.csv',delimiter=' ')
Rall = {}
for x in R1.columns:
Rall[x] = list(R1[x])+list(R2[x])
Rall = pd.DataFrame(Rall)
Rall.to_csv("Rall.csv", sep=" ")
print(Rall)
My text file
Name Surname Age Sex Grade X
Chris M. 14 M 4 10 05 2010
Adam A. 17 M 11 12 2011
Jack O. M 8 08 04 2009
...
I want to count years.
Example output:
{ '2010' : 1 , "2011" : 1 ...}
but I got "Key Error : Year".
import pandas as pd
df = pd.read_fwf("file.txt")
df.join(df['X'].str.split(' ', 2, expand = True).rename(columns={0: '1', 1: '2', 2: '3}))
df.columns=["1"]
df["1"].value_counts().dict()
What's wrong with my code?
Your df will remain original one, you have to assign to it after you join with new column, then you will get the df with column Year. Try this:
import pandas as pd
df = pd.read_fwf("file.txt")
df = df.join(df['Date'].str.split(' ', 2, expand = True).rename(columns={1: 'Year', 0: 'Month'}))
df["Year"].value_counts().to_dict()
output:
{'2009': 1, '2010': 1, '2011': 1}