Reading csv in loop stops at row that does not match

Reading csv in loop stops at row that does not match - python

I am trying to read a csv then iterate through an sde to find matching features, their fields, and then print them.
There is a table in the list and I'm not able to skip over it and continue reading the csv.
I get the "IOError: table 1 does not exist" and I only get the features that come before the table.
import arcpy
from arcpy import env
import sys
import os
import csv
with open('C:/Users/user/Desktop/features_to_look_for.csv', 'r') as t1:
objectsinESRI = [r[0] for r in csv.reader(t1)]
env.workspace = "//conn/features#dev.sde"
fcs = arcpy.ListFeatureClasses('sometext.*')
for fcs in objectsinESRI:
fieldList = arcpy.ListFields(fcs)
for field in fieldList:
print fcs + " " + ("{0}".format(field.name))
Sample csv rows (can't seem to post a screenshot of the excel file)
feature 1
feature 2
feature 3
feature 4
table 1
feature 5
feature 6
feature 7
feature 8
feature 9
Result
feature 1
feature 2
feature 3
feature 4
Desired Result
feature 1
feature 2
feature 3
feature 4
feature 5
feature 6
feature 7
feature 8
feature 9

So as stated, I have no clue about arcpy but this seems the way so start. Looking at the docs, your objectsInEsri seems to be the equivalent of the datasets in the example. From there I extrapolate the following code which, depending on what print(fc) is printing, you may need to extend with yet another for.
So try this:
for object in objectsInEsri:
for fc in fcs:
print(fc)
Or maybe this:
for object in objectsInEsri:
for fc in fcs:
for field in arcpy.ListFields(fc)
print(object + " " + ("{0}".format(field.name)))
Then I may be completely wrong ofc but then just write first the outermore for, see what is giving to you, and keep building from there :)

Related

Python in enumerate not giving expected output

I'm having an issue with output from in enumerate function. It is adding parenthesis and commas into the data. I'm trying to use the list for a comparison loop. Can anyone tell me why the special characters are added resembling tuples? I'm going crazy here trying to finish this but this bug is causing issues.
# Pandas is a software library written for the Python programming language for data manipulation and analysis.
import pandas as pd
#NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
import numpy as np
df=pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/dataset_part_1.csv")
df.head(10)
df.isnull().sum()/df.count()*100
df.dtypes
# Apply value_counts() on column LaunchSite
df[['LaunchSite']].value_counts()
# Apply value_counts on Orbit column
df[['Orbit']].value_counts()
#landing_outcomes = values on Outcome column
landing_outcomes = df[['Outcome']].value_counts()
print(landing_outcomes)
#following causes data issue
for i,outcome in enumerate(landing_outcomes.keys()):
print(i,outcome)
#following also causes an issue to the data
bad_outcomes=set(landing_outcomes.keys()[[1,3,5,6,7]])
bad_outcomes
# landing_class = 0 if bad_outcome
# landing_class = 1 otherwise
landing_class = []
for value in df['Outcome'].items():
if value in bad_outcomes:
landing_class.append(0)
else:
landing_class.append(1)
df['Class']=landing_class
df[['Class']].head(8)
df.head(5)
df["Class"].mean()
The issue I'm having is
for i,outcome in enumerate(landing_outcomes.keys()):
print(i,outcome)
is changing my data and giving an output of
0 ('True ASDS',)
1 ('None None',)
2 ('True RTLS',)
3 ('False ASDS',)
4 ('True Ocean',)
5 ('False Ocean',)
6 ('None ASDS',)
7 ('False RTLS',)
additionally, when I run
bad_outcomes=set(landing_outcomes.keys()[[1,3,5,6,7]])
bad_outcomes
my output is
{('False ASDS',),
('False Ocean',),
('False RTLS',),
('None ASDS',),
('None None',)}
I do not understand why my data return is far from expected and how to correct it.

Try this
for i, (outcome,) in enumerate(landing_outcomes.keys()):
print(i, outcome)
Or
for i, outcome in enumerate(landing_outcomes.keys()):
print(i, outcome[0])

Save each Excel-spreadsheet-row with header in separate .txt-file (saved as a parameter-sample to be read by simulation programs)

I'm a building energy simulation modeller with an Excel-question to enable automated large-scale simulations using parameter samples (samples generated using Monte Carlo). Now I have the following question in saving my samples:
I want to save each row of an Excel-spreadsheet in a separate .txt-file in a 'special' way to be read by simulation programs.
Let's say, I have the following excel-file with 4 parameters (a,b,c,d) and 20 values underneath:
a b c d
2 3 5 7
6 7 9 1
3 2 6 2
5 8 7 6
6 2 3 4
Each row of this spreadsheet represents a simulation-parameter-sample.
I want to store each row in a separate .txt-file as follows (so 5 '.txt'-files for this spreadsheet):
'1.txt' should contain:
a=2;
b=3;
c=5;
d=7;
'2.txt' should contain:
a=6;
b=7;
c=9;
d=1;
and so on for files '3.txt', '4.txt' and '5.txt'.
So basically matching the header with its corresponding value underneath for each row in a separate .txt-file ('header equals value;').
Is there an Excel add-in that does this or is it better to use some VBA-code? Anybody some idea?
(I'm quit experienced in simulation modelling but not in programming, therefore this rather easy parameter-sample-saving question in Excel. (Solutions in Python are also welcome if that's easier for you people))

my idea would be to use Python along with Pandas as it's one of the most flexible solutions, as your use case might expand in the future.
I'm gonna try making this as simple as possible. Though I'm assuming, that you have Python, that you know how to install packages via pip or conda and are ready to run a python script on whatever system you are using.
First your script needs to import pandas and read the file into a DataFrame:
import pandas as pd
df = pd.read_xlsx('path/to/your/file.xlsx')
(Note that you might need to install the xlrd package, in addition to pandas)
Now you have a powerful data structure, that you can manipulate in plenty of ways. I guess the most intuitive one, would be to loop over all items. Use string formatting, which is best explained over here and put the strings together the way you need them:
outputs = {}
for row in df.index:
s = ""
for col in df.columns:
s += "{}={};\n".format(col, df[col][row])
print(s)
now you just need to write to a file using python's io method open. I'll just name the files by the index of the row, but this solution will overwrite older text files, created by earlier runs of this script. You might wonna add something unique like the date and time or the name of the file you read to it or increment the file name further with multiple runs of the script, for example like this.
All together we get:
import pandas as pd
df = pd.read_excel('path/to/your/file.xlsx')
file_count = 0
for row in df.index:
s = ""
for col in df.columns:
s += "{}={};\n".format(col, df[col][row])
file = open('test_{:03}.txt'.format(file_count), "w")
file.write(s)
file.close()
file_count += 1
Note that it's probably not the most elegant way and that there are one liners out there, but since you are not a programmer I thought you might prefer a more intuitive way, that you can tweak yourself easily.

I got this to work in Excel. You can expand the length of the variables x,y and z to match your situation and use LastRow, LastColumn methods to find the dimensions of your data set. I named the original worksheet "Data", as shown below.
Sub TestExportText()
Dim Hdr(1 To 4) As String
Dim x As Long
Dim y As Long
Dim z As Long
For x = 1 To 4
Hdr(x) = Cells(1, x)
Next x
x = 1
For y = 1 To 5
ThisWorkbook.Sheets.Add After:=Sheets(Sheets.Count)
ActiveSheet.Name = y
For z = 1 To 4
With ActiveSheet
.Cells(z, 1) = Hdr(z) & "=" & Sheets("Data").Cells(x + 1, z) & ";"
End With
Next z
x = x + 1
ActiveSheet.Move
ActiveWorkbook.ActiveSheet.SaveAs Filename:="File" & y & ".txt", FileFormat:=xlTextWindows
ActiveWorkbook.Close SaveChanges:=False
Next y
End Sub

If you can save your Excel spreadsheet as a CSV file then this python script will do what you want.
with open('data.csv') as file:
data_list = [l.rstrip('\n').split(',') for l in file]
counter = 1
for x in range (1, len (data_list)) :
output_file_name = str (counter) + '.txt'
with open (output_file_name, 'w' ) as file :
for x in range (len (data_list [counter])) :
print (x)
output_string = data_list [0] [x] + '=' + data_list [counter] [x] + ';\n'
file.write (output_string)
counter += 1

Creating a document term matrix using fit_transform

I have an array that takes in string values from a json file. I want to create a document matrix to see the repeated words but when I pass in the array I get an error:
AttributeError: 'NoneType' object has no attribute 'lower'
This is the line that gets the error all the time:
sparse_matrix = count_vectorizer.fit_transform(issues_description)
issues_description = []
issues_key = []
with open('issues_CLOVER.json') as json_file:
data = json.load(json_file)
for record in data:
issues_key.append(record['key'])
issues_description.append(record['fields']['description'])
df = pd.DataFrame({'Key' : issues_key, 'Description' : issues_description})
df.head(10)
This is the data that gets displayed:
Key Description
0 CLOV-1985 h2. Environment Details\r\n\r\nThis bug occurs...
1 CLOV-1984 Clover fails to instrument source code in case...
2 CLOV-1979 If a type argument for a parameterized type ha...
3 CLOV-1978 Bug affects Clover 3.3.0 and higher.\r\n\r\n \...
4 CLOV-1977 Add support to able to:\r\n * instrument sourc...
5 CLOV-1976 Add support to Groovy code in Clover for Eclip...
6 CLOV-1973 See also --CLOV-1956--.\r\n\r\nIn case HUDSON_...
7 CLOV-1970 Steps to reproduce:\r\n\r\nCoverage Explorer >...
8 CLOV-1967 Test Clover against IntelliJ IDEA 2016.3 EAP (...
9 CLOV-1966 *Problem*\r\n\r\nClover Maven Plugin replaces ...
# Scikit Learn
from sklearn.feature_extraction.text import CountVectorizer
# Create the Document Term Matrix
count_vectorizer = CountVectorizer(stop_words='english')
count_vectorizer = CountVectorizer()
sparse_matrix = count_vectorizer.fit_transform(issues_description)
# OPTIONAL: Convert Sparse Matrix to Pandas Dataframe if you want to see the word frequencies.
doc_term_matrix = sparse_matrix.todense()
df = pd.DataFrame(doc_term_matrix,
columns=count_vectorizer.get_feature_names(),
index=[issues_key[0],issues_key[1],issues_key[2]])
df
What do I change in order to get issues_description a passable arg or can someone point to me what I need to know in order for it to work?
Thanks.

How to divide a dbf table to two or more dbf tables by using python

I have a dbf table. I want to automatically divide this table into two or more tables by using Python. The main problem is, that this table consists of more groups of lines. Each group of lines is divided from the previous group by empty line. So i need to save each of groups to a new dbf table. I think that this problem could be solved by using some function from Arcpy package and FOR cycle and WHILE, but my brain cant solve it :D :/ My source dbf table is more complex, but i attach a simple example for better understanding. Sorry for my poor english.
Source dbf table:
ID NAME TEAM
1 A 1
2 B 2
3 C 1
4
5 D 2
6 E 3
I want get dbf1:
ID NAME TEAM
1 A 1
2 B 2
3 C 1
I want get dbf2:
ID NAME TEAM
1 D 2
2 E 3

Using my dbf package it could look something like this (untested):
import dbf
source_dbf = '/path/to/big/dbf_file.dbf'
base_name = '/path/to/smaller/dbf_%03d'
sdbf = dbf.Table(source_dbf)
i = 1
ddbf = sdbf.new(base_name % i)
sdbf.open()
ddbf.open()
for record in sdbf:
if not record.name: # assuming if 'name' is empty, all are empty
ddbf.close()
i += 1
ddbf = sdbf.new(base_name % i)
continue
ddbf.append(record)
ddbf.close()
sdbf.close()

Input file, modify column, output file

I have data in a text file and I would like to be able to modify the file by columns and output the file again. I normally write in C (basic ability) but choose python for it's obvious string benefits. I haven't ever used python before so I'm a tad stuck. I have been reading up on similar problems but they only show how to change whole lines. To be honest I have on clue what to do.
Say I have the file
1 2 3
4 5 6
7 8 9
and I want to be able to change column two with some function say multiply it by 2 so I get
1 4 3
4 10 6
7 16 9
Ideally I would be able to easily change the program so I apply any function to any column.
For anyone who is interested it is for modifying lab data for plotting. eg take the log of the first column.

Python is an excellent general purpose language however I might suggest that if you are on an Unix based system then maybe you should take a look at awk. The language awk is design for these kind of text based transformation. The power of awk is easily seen for your question as the solution is only a few characters: awk '{$2=$2*2;print}'.
$ cat file
1 2 3
4 5 6
7 8 9
$ awk '{$2=$2*2;print}' file
1 4 3
4 10 6
7 16 9
# Multiple the third column by 10
$ awk '{$3=$3*10;print}' file
1 2 30
4 5 60
7 8 90
In awk each column is referenced by $i where i is the ith field. So we just set the value of second field to be the value of second field multiplied by two and print the line. This can be written even more concisely like awk '{$2=$2*2}1' file but best to be clear at beginning.

Here is a very simple Python solution:
for line in open("myfile.txt"):
col = line.strip().split(' ')
print col[0],int(col[1])*2,col[2]
There are plenty of improvements that could made but I'll leave that as an exercise for you.

I would use pandas or just numpy. Read your file with:
data = pd.read_csv('file.txt', header=None, delim_whitespace=True)
then work with the data in a spreadsheet like style, ex:
data.values[:,1] *= 2
finally write again to file with:
data.to_csv('output.txt')

As #sudo_O said, there are much efficient tools than python for this task. However,here is a possible solution :
from itertools import imap, repeat
import csv
fun = pow
with open('m.in', 'r') as input_file :
with open('m.out', 'wb') as out_file:
inpt = csv.reader(input_file, delimiter=' ')
out = csv.writer(out_file, delimiter=' ')
for row in inpt:
row = [ int(e) for e in row] #conversion
opt = repeat(2, len(row) ) # square power for every value
# write ( function(data, argument) )
out.writerow( [ str(elem )for elem in imap(fun, row , opt ) ] )
Here it multiply every number by itself, but you can configure it to multiply only the second colum, by changing opt : opt = [ 1 + (col == 1) for col in range(len(row)) ] (2 for col 1, 1 otherwise )

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading csv in loop stops at row that does not match - python

Related

Python in enumerate not giving expected output

Save each Excel-spreadsheet-row with header in separate .txt-file (saved as a parameter-sample to be read by simulation programs)

Creating a document term matrix using fit_transform

How to divide a dbf table to two or more dbf tables by using python

Input file, modify column, output file

Categories

Resources