Producing pandas DataFrame from table in text file

Producing pandas DataFrame from table in text file - python

I have some data in a text file which looks like this:
(v14).K TaskList[Parameter Estimation].(Problem)Parameter Estimation.Best Value
5.00885e-007 3.0914e+007
5.75366e-007 2.99467e+007
6.60922e-007 2.99199e+007
I'm trying to get this data into a pandas dataframe. The code I've written below partially works but has formatting issues:
def parse_PE_results(results_file):
with open(results_file) as f:
data=f.readlines()
parameter_value=[]
best_value=[]
for i in data:
split= i.split('\t')
parameter_value.append(split[0])
best_value.append(split[1].rstrip())
pv=pandas.Series(parameter_value,name=parameter_value[0])
bv=pandas.Series(best_value,name=best_value[0])
df=pandas.DataFrame({parameter_value[0]:pv,best_value[0]:bv})
return df
I get the feeling that there must be an easier, more 'pythonic' way of building a data frame from text files. Would anybody happen to know what that is?

Use pandas.read_csv. The entire parse_PE_results function can be replaced with
df = pd.read_csv(results_file, delimiter='\t')
You'll also enjoy better performance by using read_csv instead of calling
data=f.readlines() and looping through it line by line.

Related

How can I have Pandas recognize the structure of my data properly?

I have some data saved in ".txt" files. this is how they are stored:
I used the code below to read the data and save it in a data frame object: (no need to mention that I'm using pandas library of python):
new_df = pd.read_csv(location, sep='\t', lineterminator='\n', names=None)
the problem is that when I get the shape of my data frame with new_df.shape I end up with: (123,1). It does not recognize that the data have 4 columns. How can I fix this?

It seems you don't have tab but spaces - use sep="\s+"

From your screenshot, your data appear to be in fixed width format.
Try to use pandas.read_fwf to read your data file:
pd.read_fwf(location)
You may pass the colspecs=... argument to tell it in which column each of the data are, but the routine is smart enough to figure this out automagically.

How do you read rows from a csv file and store it in an array using Python codes?

I have a CSV file, diseases_matrix_KNN.csv which has excel table.
Now, I would like to store all the numbers from the row like:
Hypothermia = [0,-1,0,0,0,0,0,0,0,0,0,0,0,0]
For some reason, I am unable to find a solution to this. Even though I have looked. Please let me know if I can read this type of data in the chosen form, using Python please.

most common way to work with excel is use Pandas.
Here is example:
import pandas as pd
df = pd.read_excel(filename)
print (df.iloc['Hypothermia']). # gives you such result

Creating a dataframe from a csv file in pandas: column issue

I have a messy text file that I need to sort into columns in a dataframe so I
can do the data analysis I need to do. Here is the messy looking file:
Messy text
I can read it in as a csv file, that looks a bit nicer using:
import pandas as pd
data = pd.read_csv('phx_30kV_indepth_0_0_outfile.txt')
print(data)
And this prints out the data aligned, but the issue is that the output is [640 rows x 1 column]. And I need to separate it into multiple columns and manipulate it as a dataframe.
I have tried a number of solutions using StringIO that have worked here before, but nothing seems to be doing the trick.
However, when I do this, there is the issue that the

delim_whitespace=True
Link to docs ^
df = pd.read_csv('phx_30kV_indepth_0_0_outfile.txt', delim_whitespace=True)

Your input file is actually not in CSV format.
As you provided only .png picture, it is even not clear, whether this file
is divided into rows or not.
If not, you have to start from "cutting" the content into individual lines and
read the content from the output file - result of this cutting.
I think, this is the first step, before you can use either read_csv or read_table (of course, with delim_whitespace=True).

saving a dataframe to csv file (python)

I am trying to restructure the way my precipitations' data is being organized in an excel file. To do this, I've written the following code:
import pandas as pd
df = pd.read_excel('El Jem_Souassi.xlsx', sheetname=None, header=None)
data=df["El Jem"]
T=[]
for column in range(1,56):
liste=data[column].tolist()
for row in range(1,len(liste)):
liste[row]=str(liste[row])
if liste[row]!='nan':
T.append(liste[row])
result=pd.DataFrame(T)
result
This code works fine and through Jupyter I can see that the result is good
screenshot
However, I am facing a problem when attempting to save this dataframe to a csv file.
result.to_csv("output.csv")
The resulting file contains the vertical index column and it seems I am unable to call for a specific cell.
(Hopefully, someone can help me with this problem)
Many thanks !!

It's all in the docs.
You are interested in skipping the index column, so do:
result.to_csv("output.csv", index=False)
If you also want to skip the header add:
result.to_csv("output.csv", index=False, header=False)
I don't know how your input data looks like (it is a good idea to make it available in your question). But note that currently you can obtain the same results just by doing:
import pandas as pd
df = pd.DataFrame([0]*16)
df.to_csv('results.csv', index=False, header=False)

Writing value to given filed in csv file using pandas or csv module

Is there any way you can write value to specific place in given .csv file using pandas or csv module?
I have tried using csv_reader to read the file and find a line which fits my requirements though I couldn't figure out a way to switch value which is in the file to mine.
What I am trying to achieve here is that I have a spreadsheet of names and values. I am using JSON to update the values from the server and after that I want to update my spreadsheet also.
The latest solution which I came up with was to create separate sheet from which I will get updated data, but this one is not working, though there is no sequence in which the dict is written to the file.
def updateSheet(fileName, aValues):
with open(fileName+".csv") as workingSheet:
writer = csv.DictWriter(workingSheet,aValues.keys())
writer.writeheader()
writer.writerow(aValues)
I will appreciate any guidance and tips.

You can try this way to operate the specified csv file
import pandas as pd
a = ['one','two','three']
b = [1,2,3]
english_column = pd.Series(a, name='english')
number_column = pd.Series(b, name='number')
predictions = pd.concat([english_column, number_column], axis=1)
save = pd.DataFrame({'english':a,'number':b})
save.to_csv('b.csv',index=False,sep=',')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Producing pandas DataFrame from table in text file - python

Use pandas.read_csv. The entire parse_PE_results function can be replaced with df = pd.read_csv(results_file, delimiter='\t') You'll also enjoy better performance by using read_csv instead of calling data=f.readlines() and looping through it line by line.

Related

How can I have Pandas recognize the structure of my data properly?

How do you read rows from a csv file and store it in an array using Python codes?

Creating a dataframe from a csv file in pandas: column issue

saving a dataframe to csv file (python)

Writing value to given filed in csv file using pandas or csv module

Categories

Resources