Splitting a column into 2 in a csv file using python - python

I have a .csv file with 100 rows of data displayed like this
"Jim 1234"
"Sam 1235"
"Mary 1236"
"John 1237"
What I'm trying to achieve is splitting the numbers from the names into 2 columns in python
edit*
Using,
import pandas as pd
df = pd.read_csv('test.csv', sep='\s+')
df.to_csv('result.csv', index=False)
I managed to get it to display like this in excel
However, the numbers still do not show up in column B as I expected.

Your data have only one column and a tab delimiter:
pd.read_csv('test.csv', quoting=1, header=None, squeeze=True) \
.str.split('\t', expand=True) \
.to_csv('result.csv', index=False, header=False)

very simple way,
data=pd.DataFrame(['Jim1234','Sam4546'])
data[0].str.split('(\d+)', expand=True)

if your file resemble to the picture below then the next code will work csv file content
import pandas as pd
df = pd.read_csv('a.csv', header=None, delimiter='\s')
df
code execution

Related

df = pd.read_csv('file.csv') adds random numbers and commas to file. Pandas in python

I am trying to read a csv file using pandas as so:
df = pd.read_csv('file.csv')
Here is the file before:
,schoolId,Name,Meetings Present
0,991,Jimmy Nuetron,2
1,992,Jimmy Fuetron,6
2,993,Cam Nuetron,4
Here is the file after:
,Unnamed: 0,schoolId,Name,Meetings Present
0,0.0,991.0,Jimmy Nuetron,2.0
1,1.0,992.0,Jimmy Fuetron,6.0
2,2.0,993.0,Cam Nuetron,4.0
0,,,,3
Why is it adding the numbers and columns when I run the read_csv method?
How can I prevent this without adding a seperator?
pandas.read_csv is actuallly not adding the column Unnamed: 0 because it already exists in your .csv (who apparently/probably was generated by the method pandas.DataFrame.to_csv).
You can get rid of this (extra) column by making it as an index :
df = pd.read_csv('file.csv', index_col=0)

Extra column appears when appending selected row from one csv to another in Python

I have this code which appends a column of a csv file as a row to another csv file:
def append_pandas(s,d):
import pandas as pd
df = pd.read_csv(s, sep=';', header=None)
df_t = df.T
df_t.iloc[0:1, 0:1] = 'Time Point'
df_t.at[1, 0] = 1
df_t.columns = df_t.iloc[0]
df_new = df_t.drop(0)
pdb = pd.read_csv(d, sep=';')
newpd = pdb.append(df_new)
from pandas import DataFrame
newpd.to_csv(d, sep=';')
The result is supposed to look like this:
Instead, every time the row is appended, there is an extra "Unnamed" column appearing on the left:
Do you know how to fix that?..
Please, help :(
My csv documents from which I select a column look like this:
You have to add index=False to your to_csv() method

pandas drop rows based on cell content and no headers

I'm reading a csv file with pandas that has no headers.
df = pd.read_csv('file.csv', header=0)
csv file containing 1 row with several users:
admin
user
system
sysadmin
adm
administrator
I need to read the file to a df or a list except for example: sysadmin
and save the result to the csv file
admin
user
system
adm
administrator
Select first columns, filter by boolean indexing and write to file:
df = pd.read_csv('file.csv', header=0)
df[df.iloc[:, 0].ne('sysadmin')].to_csv(file, index=False)
#if there is csv header converted to column name
#df[df['colname'].ne('sysadmin')].to_csv(file, index=False)
If no header in csv need parameters like:
df = pd.read_csv('file.csv', header=None)
df[df.iloc[:, 0].ne('sysadmin')].to_csv(file, index=False, header=False)
you can give it a try:---
df = pd.read_csv('file.csv')
df=df.transpose()
df.columns=['admin','user','system','sysadmin','adm','administrator']
df.head()
I think this will work it out.
if not then try it:
df = pd.read_csv('file.csv')
df.columns=['admin','user','system','sysadmin','adm','administrator']
df.head()

pandas read csv is confused when commas within quotes

col1, col2, geometry
11.54000000,0.00000000,"{"type":"Polygon","coordinates":[[[-61.3115751786311,-33.83968838375797],[-61.29737019968823,-33.83207774370677],[-61.29443049860791,-33.83592770721248],[-61.29241347742871,-33.83489393774538],[-61.28994584513501,-33.83806650089736],[-61.292499308117186,-33.83938539699006],[-61.28958106470898,-33.8431993873636],[-61.29307859612687,-33.84495487100211],[-61.295256567865046,-33.846135537383866],[-61.296388484054326,-33.84676149889543],[-61.296747927196776,-33.84651421268175],[-61.297498943449426,-33.84670133707654],[-61.297992472179686,-33.847120134589964],[-61.299741220055196,-33.84901812154847],[-61.3012164422457,-33.85018089588664],[-61.3015892874819,-33.850566250375365],[-61.30284190607861,-33.85079121660985],[-61.30496105223345,-33.848193766906206],[-61.306084952130036,-33.84682375029292],[-61.30707604410075,-33.845532812572294],[-61.30672627175046,-33.84527169005647],[-61.306290670206494,-33.845188781884744],[-61.304604048903514,-33.847304098561025],[-61.30309763921784,-33.84654473836309],[-61.30013213880613,-33.84478736144466],[-61.30110629620797,-33.8431690707163],[-61.303046037678854,-33.844170576767105],[-61.30433047221653,-33.84266156764314],[-61.30484242472771,-33.842899106713375],[-61.30696068650711,-33.844104878773436],[-61.306418212892446,-33.84505221083753],[-61.307163201216696,-33.845464893960255],[-61.30760172622554,-33.84490909256552],[-61.307932962646014,-33.844513681420494],[-61.309176116985405,-33.84280834206188],[-61.30596211112515,-33.841126948963954],[-61.3056475423994,-33.841449215098756],[-61.30526859890979,-33.841557611902374],[-61.30483601097522,-33.84149669494795],[-61.30448925534122,-33.84120408616046],[-61.30410688411086,-33.840609953572034],[-61.30400151682434,-33.839925243738094],[-61.30240379835875,-33.83889223688216],[-61.30188418287129,-33.838444480832685],[-61.301130848179525,-33.83943255499186],[-61.30078636095504,-33.83996223583909],[-61.30059265818967,-33.84016469670277],[-61.30048478527255,-33.840438447848506],[-61.300252198180424,-33.84026774340676],[-61.29876711207748,-33.839489883020924],[-61.29799408649143,-33.840597902688785],[-61.297669258508,-33.84103160870988],[-61.297566592962134,-33.84112444052047],[-61.29748538503245,-33.841083604060834],[-61.297140578061956,-33.84134946797752],[-61.29709617977233,-33.84160419097128],[-61.297170540239335,-33.84168254110631],[-61.297341460506956,-33.84179653572337],[-61.297243418161194,-33.84197105818567],[-61.29699517169225,-33.84200300239938],[-61.29680176950715,-33.84179064473802],[-61.29691703393983,-33.8416707218475],[-61.297053755769845,-33.841604265738546],[-61.29707920124143,-33.84154875978832],[-61.29709391784669,-33.84147543150246],[-61.29711262215961,-33.84133768608576],[-61.296951411710374,-33.84119216012805],[-61.297262269660294,-33.84089514360839],[-61.297626491077864,-33.84051497848962],[-61.29865532547658,-33.83935363544152],[-61.30027710358755,-33.84011486145675],[-61.30046658230606,-33.83996490243917],[-61.30063460268783,-33.83979712050095],[-61.300992098665965,-33.8393813535522],[-61.301799802937595,-33.83832425565103],[-61.30135527704997,-33.837671541923235],[-61.30082030025984,-33.83731962483044],[-61.299512855628244,-33.83689640801839],[-61.29879550338594,-33.8363083288346],[-61.29831419490918,-33.835559835856905],[-61.298360098160686,-33.83408067231082],[-61.29976541168753,-33.83467181800819],[-61.30104200723692,-33.83586895614681],[-61.30133434017162,-33.83606352507277],[-61.30153415160492,-33.836339043812224],[-61.30164813329583,-33.83657891551336],[-61.30124575062752,-33.83743146168004],[-61.30195917352424,-33.83831965157767],[-61.30196183786503,-33.83843401993221],[-61.30250094586367,-33.83890484694379],[-61.304002690127376,-33.83984352469762],[-61.30473149692381,-33.8397514189025],[-61.3054487998093,-33.839941491549894],[-61.30582354557356,-33.84016574092716],[-61.30604808932503,-33.84046128014441],[-61.306143888278996,-33.840801374736316],[-61.30598219492593,-33.841088001849094],[-61.30757239940571,-33.841967156609876],[-61.30920555104759,-33.84277500140921],[-61.3115751786311,-33.83968838375797],[-61.3115751786311,-33.83968838375797]]]}"
How do I read a csv with syntax like above?
I am doing:
import pandas as pd
df = pd.read_csv('file.csv')
However, read_csv gets confused with the , within "{"type":"Polygon","coordinates": I want it to ignore the , within the quotes.
Your csv file contains a MultiIndex, which is causing your read and split issues.
I have tried multiple methods to read your file correctly. The best method that I have found so far is using the Python engine with an advanced separator in the read_csv function.
import pandas as pd
# these are for viewing the output
pd.set_option('display.max_columns', 30)
pd.set_option('display.max_rows', 100)
pd.set_option('display.width', 120)
# The separator matches the format of the string that you provided.
# I'm sure that it can be modified to be more efficient.
df = pd.read_csv('test.csv', skiprows=1, sep='(\d{1,2}.\d{1,8}),(\d{1,2}.\d{1,8}),("{"type":.*)',engine="python")
# some cleanup
df = df.drop(df.columns[0], axis=1)
# I had to save the processed file
df.to_csv('test_01.csv')
# read in the new file
df = pd.read_csv('test_01.csv', header=None, index_col=0)
print(df.to_string(index=False))
11.54 0.0 "{"type":"Polygon","coordinates":[[[-61.3115751786311,-33.83968838375797],[-61.29737019968823,-33.83207774370677],[-61.29443049860791,-33.83592770721248],[-61.29241347742871,-33.83489393774538],[-61.28994584513501,-33.83806650089736],[-61.292499308117186,-33.83938539699006],[-61.28958106470898,-33.8431993873636],[-61.29307859612687,-33.84495487100211],[-61.295256567865046,-33.846135537383866],[-61.296388484054326,-33.84676149889543],[-61.296747927196776,-33.84651421268175],[-61.297498943449426,-33.84670133707654],[-61.297992472179686,-33.847120134589964],[-61.299741220055196,-33.84901812154847],[-61.3012164422457,-33.85018089588664],[-61.3015892874819,-33.850566250375365],[-61.30284190607861,-33.85079121660985],[-61.30496105223345,-33.848193766906206],[-61.306084952130036,-33.84682375029292],[-61.30707604410075,-33.845532812572294],[-61.30672627175046,-33.84527169005647],[-61.306290670206494,-33.845188781884744],[-61.304604048903514,-33.847304098561025],[-61.30309763921784,-33.84654473836309],[-61.30013213880613,-33.84478736144466],[-61.30110629620797,-33.8431690707163],[-61.303046037678854,-33.844170576767105],[-61.30433047221653,-33.84266156764314],[-61.30484242472771,-33.842899106713375],[-61.30696068650711,-33.844104878773436],[-61.306418212892446,-33.84505221083753],[-61.307163201216696,-33.845464893960255],[-61.30760172622554,-33.84490909256552],[-61.307932962646014,-33.844513681420494],[-61.309176116985405,-33.84280834206188],[-61.30596211112515,-33.841126948963954],[-61.3056475423994,-33.841449215098756],[-61.30526859890979,-33.841557611902374],[-61.30483601097522,-33.84149669494795],[-61.30448925534122,-33.84120408616046],[-61.30410688411086,-33.840609953572034],[-61.30400151682434,-33.839925243738094],[-61.30240379835875,-33.83889223688216],[-61.30188418287129,-33.838444480832685],[-61.301130848179525,-33.83943255499186],[-61.30078636095504,-33.83996223583909],[-61.30059265818967,-33.84016469670277],[-61.30048478527255,-33.840438447848506],[-61.300252198180424,-33.84026774340676],[-61.29876711207748,-33.839489883020924],[-61.29799408649143,-33.840597902688785],[-61.297669258508,-33.84103160870988],[-61.297566592962134,-33.84112444052047],[-61.29748538503245,-33.841083604060834],[-61.297140578061956,-33.84134946797752],[-61.29709617977233,-33.84160419097128],[-61.297170540239335,-33.84168254110631],[-61.297341460506956,-33.84179653572337],[-61.297243418161194,-33.84197105818567],[-61.29699517169225,-33.84200300239938],[-61.29680176950715,-33.84179064473802],[-61.29691703393983,-33.8416707218475],[-61.297053755769845,-33.841604265738546],[-61.29707920124143,-33.84154875978832],[-61.29709391784669,-33.84147543150246],[-61.29711262215961,-33.84133768608576],[-61.296951411710374,-33.84119216012805],[-61.297262269660294,-33.84089514360839],[-61.297626491077864,-33.84051497848962],[-61.29865532547658,-33.83935363544152],[-61.30027710358755,-33.84011486145675],[-61.30046658230606,-33.83996490243917],[-61.30063460268783,-33.83979712050095],[-61.300992098665965,-33.8393813535522],[-61.301799802937595,-33.83832425565103],[-61.30135527704997,-33.837671541923235],[-61.30082030025984,-33.83731962483044],[-61.299512855628244,-33.83689640801839],[-61.29879550338594,-33.8363083288346],[-61.29831419490918,-33.835559835856905],[-61.298360098160686,-33.83408067231082],[-61.29976541168753,-33.83467181800819],[-61.30104200723692,-33.83586895614681],[-61.30133434017162,-33.83606352507277],[-61.30153415160492,-33.836339043812224],[-61.30164813329583,-33.83657891551336],[-61.30124575062752,-33.83743146168004],[-61.30195917352424,-33.83831965157767],[-61.30196183786503,-33.83843401993221],[-61.30250094586367,-33.83890484694379],[-61.304002690127376,-33.83984352469762],[-61.30473149692381,-33.8397514189025],[-61.3054487998093,-33.839941491549894],[-61.30582354557356,-33.84016574092716],[-61.30604808932503,-33.84046128014441],[-61.306143888278996,-33.840801374736316],[-61.30598219492593,-33.841088001849094],[-61.30757239940571,-33.841967156609876],[-61.30920555104759,-33.84277500140921],[-61.3115751786311,-33.83968838375797],[-61.3115751786311,-33.83968838375797]]]}"
Try this:
pd.read_csv('file.csv',quotechar='"',skipinitialspace=True)

Read text file into excel without putting all data in first column

I have written some code that requests a text file from the web, reads it, and outputs the necessary data into an excel spreadsheet. However, it is not quite in the format I need as it is writing each row as one item, and putting it all into the first column. My code is below, along with an image of the output as my code currently stands. I'd like a column for the date, and then one for each livestock animal with the amount processed.
import pandas as pd
import os
import urllib.request
filename = "Livestock Slaughter.txt"
os.chdir(r'S:\1WORKING\FINANCIAL ANALYST\Shawn Schreier\Commodity Dashboard')
directory = os.getcwd()
url = 'https://www.ams.usda.gov/mnreports/sj_ls710.txt'
data=urllib.request.urlretrieve(url, "Slaughter Rates.txt")
df = pd.read_csv("Slaughter Rates.txt", sep='\t', skiprows=5, nrows=3)
df.to_excel('Slaughter Data.xlsx')
As in comments, you can use delim_whitespace=True to load the CSV and then do some post-processing to get correct data. You can also put URL to pd.read_csv() directly:
import pandas as pd
df = pd.read_csv('https://www.ams.usda.gov/mnreports/sj_ls710.txt', delim_whitespace=True, skiprows=5, nrows=3).reset_index()
df = pd.concat([df.loc[:, 'level_0':'level_2'].agg(' '.join, axis=1), df.iloc[:, 3:]], axis=1)
print(df)
df.to_csv('data.csv')
Prints:
0 CATTLE CALVES HOGS SHEEP
0 Tuesday 07/21/2020 (est 118,000 2,000 478,000 7,000
1 Week ago (est) 119,000 2,000 475,000 8,000
2 Year ago (act) 121,000 3,000 476,000 8,000
And saves the data as data.csv (screenshot from LibreOffice):

Categories