Extracting individual rows from dataframe

Extracting individual rows from dataframe - python

I am currently doing one of my final assignment and I have a CSV file with a few columns of different data.
Currently interested in extracting out a single column and converting the individual rows into a txt file.
Here is my code:
import pandas as pd
import csv
df = pd.read_csv("AUS_NZ.csv")
print(df.head(10))
print(df["content"])
num_of_review = len(df["content"])
print(num_of_review)
for i in range (num_of_review):
with open ("{}.txt".format(i),"a", encoding="utf-8") as f:
f.write(df["content"][i])
No issue with extracting out the individual rows. But when I examine the txt files that was extracted and look at the content, I noticed that it copied out the text (which is what I want) but it did so twice (which is not what I want).
Example:
"This is an example of what the dataframe have at that particular column which I want to convert to a txt file."
This is what was copied to the txt file:
"This is an example of what the dataframe have at that particular column which I want to convert to a txt file.This is an example of what the dataframe have at that particular column which I want to convert to a txt file."
Any advise on how to just copy the content once only?

Thanks! While thinking about how to rectify this, I came to the same conclusion as you. I made a switch from "a" to "w" and it solved that issue.
Too used to append so I tried that before I tried write.
The correct code:
import pandas as pd
import csv
df = pd.read_csv("AUS_NZ.csv")
print(df.head(10))
print(df["content"])
num_of_review = len(df["content"])
print(num_of_review)
for i in range (num_of_review):
with open ("{}.txt".format(i),"w", encoding="utf-8") as f:
f.write(df["content"][i])

Related

Extracting a column from a collection of csv files and constructing a new table with said data

I'm a newbie when it comes to Python with a bit more experience in MATLAB. I'm currently trying to write a script that basically loops through a folder to pick up all the .csv files, extract column 14 from csv file 1 and adding it to column 1 of the new table, extract column 14 from csv file 2 and adding it to column 2 of the new table, to build up a table of column 14 from all csvfiles in the folder. I'd ideally like to have the headers of the new table to show the respective filename that said column 14 has been extracted from.
I've considered that Python is base0 so I've double checked that it reads the desired column, but as my code stands, i can only get it to print all the files' 14th columns in the one array and I'm not sure how to split it up to put it into a table. Perhaps via dataframe, although I'm not entirely sure how they work.
Any help would be greatly appreciated!
Code attached below:
import os
import sys
import csv
pathName = "D:/GLaDOS-CAMPUS/data/TestData-AB/"
numFiles = []
fileNames = os.listdir(pathName)
for fileNames in fileNames:
if fileNames.endswith(".csv"):
numFiles.append(fileNames)
print(numFiles)
for i in numFiles:
file = open(os.path.join(pathName, i), "rU")
reader = csv.reader(file, delimiter=',')
for column in reader:
print(column[13])

Finding files.
I'm not sure if your way of finding files is right or not. Since I do not have a folder with csv files. But I can say it is way better to use glob for getting list of files:
from glob import glob
files = glob("/Path/To/Files/*.csv")
This will return all csv files.
Reading CSV files
Now we need to find a way to read all files and get 13th column. I don't know if it is an overkill but I prefer to use pandas and numpy to get 13th column.
To read a column of a csv file using pandas one can use:
pd.read_csv(file, usecols=[COL])
Now we can loop over files and get 13th columns:
columns = [pd.read_csv(file, usecols=[2]).values[:, 0] for file in files]
Notice we converted all values to numpy arrays.
Merging all columns
In columns we have our each column as an element of a list. So it is technical rows. Not columns.
Now we should get the transpose of the array so it will become columns:
pd.DataFrame(np.transpose(columns))
The code
The whole code would look like:
from glob import glob
import pandas as pd
import numpy as np
files = glob("/Path/To/Files/*.csv")
columns = [pd.read_csv(file, usecols=[2]).values[:, 0] for file in files]
print(pd.DataFrame(np.transpose(columns)))

How can I read in csv files with columns as variable names in Python?

This is a Python question. I have a csv file and would like to read that in. The first row in the file are strings and I would like to use them as variable names. The other rows are integers and I would like them to be a vector of the name of the respective variable.
Thanks,
Tim

you need to first extract your first row I suggest to count the characters of first row and use this code to read them
f = open("demofile.txt", "r")
print(f.read(5))#put your desired counted charactor inside f.read(n)
when you successfully read it save it on variable and after saving use regex to split them with respect to ","
import re
txt = "The rain in Spain"
x = re.split("[,]", txt, 1)
print(x)
after that use dictionary methods to attain your desired result.

You can simply use pandas to read .csv files. Just install pandas using 'pip install pandas'. Then use the following code:
import pandas as pd
dataframe = pd.read_csv('data.csv')
# Returns a list containing names of the columns
column_names = list(dataframe.columns.values)

Python pandas create datafrane from csv embeded within a web txt file

I am trying to import CSV formatted data to Pandas dataframe. The CSV data is located within a .txt file the is located at a web URL. The issue is that I only want to import a part (or parts) of the .txt file that is formatted as CSV (see image below). Essentially I need to skip the first 9 rows and then import rows 10-16 as CSV.
My code
import csv
import pandas as pd
import io
url = "http://www.bom.gov.au/climate/averages/climatology/windroses/wr15/data/086282-3pmMonth.txt"
df = pd.read_csv(io.StringIO(url), skiprows = 9, sep =',', skipinitialspace = True)
df
I get a lengthy error msg that ultimately says "EmptyDataError: No columns to parse from file"
I have looked at similar examples Read .txt file with Python Pandas - strings and floats but this is different.

The code above attempts to read a CSV file from the URL itself rather than the text file fetched from that URL. To see what I mean take out the skiprows parameter and then show the data frame. You'll see this:
Empty DataFrame
Columns: [http://www.bom.gov.au/climate/averages/climatology/windroses/wr15/data/086282-3pmMonth.txt]
Index: []
Note that the columns are the URL itself.
Import requests (you may have to install it first) and then try this:
content = requests.get(url).content
df = pd.read_csv(io.StringIO(content.decode('utf-8')),skiprows=9)

Delete rows in CSV file after being read by pandas

So I want to have 1 script writing continually to a CSV file, and another script reading periodically from that same CSV file.
What I'm looking for is a way to delete the rows I've just read in from the CSV file (not from my pandas dataframe).
Can anybody help?
# Read data in to dataframe
deviceInfo = pd.read_csv("sampleData.csv", nrows = 100)
# Somehow delete those 100 rows from the CSV file

#JoseAngelSanchez is correct that you might want to read the whole csv into a dataframe, but I think this way lets you get a dataframe with the first 100 rows and still delete them from the csv file.
import pandas as pd
df = pd.read_csv("sampleData.csv")
deviceInfo = df.iloc[:100]
df.iloc[100:].to_csv("sampleData.csv")
Note: if you're doing this repetitively then you'll probably want to write to_csv(...,index=None) or a new index column will be created in the .csv file on each iteration.

You should read the whole document and then delete the rows you don't want
import pandas as pd
df = pd.read_csv("sampleData.csv")
df = df.iloc[100:]
df.to_csv("sampleData.csv")

Combining multiple .csv files using pandas and keeping the original structure

I have around 60 .csv files which i would like to combine in pandas. So far i've used this:
import pandas as pd
import glob
total_files = glob.glob("something*.csv")
data = []
for csv in total_files:
list = pd.read_csv(csv, encoding="utf-8", sep='delimiter', engine='python')
data.append(list)
biggerlist = pd.concat(data, ignore_index=True)
biggerlist.to_csv("output.csv")
This works somewhat, only the files I would like to combine all have the same structure of 15 columns with the same headers. When I use this code, only one column is filled with info of the entire row, and every column name is add-up of all column names (e.g. SEARCH_ROW, DATE, TEXT, etc.).
How can I combine these csv files, while keeping the same structure of the original files?
Edit:
So perhaps I should be a bit more specific regarding my data. This is a snapshot of one of the .csv files i'm using:
As you can see it is just newspaper-data, where the last column is 'TEXT', which isn't shown completely when you open the file.
This is a part of how it looks when i have combined the data using my code.
Apart, i can read any of these .csv files no problem using
data = pd.read_csv("something.csv",encoding="utf-8", sep='delimiter', engine='python')

I solved it!
The problem was the amount of comma's in the text part of my .csv files. So after removing all comma's (just using search/replace), I used:
import pandas
import glob
filenames = glob.glob("something*.csv")
df = pandas.DataFrame()
for filename in filenames:
df = df.append(pandas.read_csv(filename, encoding="utf-8", sep=";"))
Thanks for all the help.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting individual rows from dataframe - python

Related

Extracting a column from a collection of csv files and constructing a new table with said data

How can I read in csv files with columns as variable names in Python?

Python pandas create datafrane from csv embeded within a web txt file

Delete rows in CSV file after being read by pandas

Combining multiple .csv files using pandas and keeping the original structure

Categories

Resources