Python to C conversion, reading columns of csv into arrays

Python to C conversion, reading columns of csv into arrays - python

I am looking to read a csv file line by line and write each column to arrays in C. Essentially I want to convert the following Python code to C:
import csv
date = []
x = []
y = []
with open("file.csv") as old:
read = csv.reader(old,delimiter = ',')
for row in read:
date.append(row[0])
x.append(float(row[1]))
y.append(float(row[2]))
The csv file has 128 rows and three columns; date,x,y. My thoughts:
char Date[];
int Date[128], i;
for(i = 0; i < 128; i++)
{
Date[i] = x;
}
This is a simple example I have attempted to fill an array with values within a for loop. I want to know how I can modify this to fill arrays with each line of a csv file split by the ',' delimiter? I want to use the fscanf function but am unsure about how to incorporate it into the above setting?
Attempt:
FILE* f = fopen("file.csv", "r");
fscanf(f, "%char, %f %f", time, &value, &value);
Update:
The following code reads in a text file of my data and outputs to the screen:
#include <stdio.h>
int main(void)
{
char buff[128];
FILE * myfile;
myfile = fopen("File.txt","r");
while (!feof(myfile))
{
fgets(buff,128,myfile);
printf("%s",buff);
}
fclose(myfile);
return 0;
}
Instead of outputting to the screen, I want to store each column as an array. Any suggestions on how to do this?
Update 2.
I have updated the code as follows:
#include <stdio.h>
#include<string.h>
int main(void)
{
char buff[128];
char * entry;
FILE * myfile;
myfile = fopen("file.txt","r");
while(fgets(buff,128,myfile)){
puts(buffer);
entry = strtok(buff,",");
while(entry!=NULL)
{
printf("%s\n",entry) ;
entry = strtok(NULL,",");
}
}
return 0;
}
Final Update.
I have found an example that does something very similar and is much more intuitive for me (given my limited ability in C)
https://cboard.cprogramming.com/c-programming/139377-confused-parsing-csv-txt-file-strtok-fgets-sscanf.html
Updated code
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char line[200];
int i = 0;
int x[50];
int y[50];
int z[50];
FILE *myFile;
myFile = fopen("file.txt", "rt");
while(fgets(line, sizeof line, myFile) != NULL)
{
if(sscanf(line, "%d,%d,%d", &x[i], &y[i],&z[i]) == 3)
{
++i;
}
}
//Close the file
fclose(myFile);
getch();
return 0;
}
This code works for me provided $x$, $y$ and $z$ are integers/floats. However, when $x$ is a date, I am unable to parse it. The date is of the form $Year-Month-day-Time$.
My attempt
I have tried changing the line
if(sscanf(line, "%d,%d,%d", &x[i], &y[i],&z[i]) == 3)
to
if(sscanf(line, "%d-%d-%d-%d",&year[i],&month[i], &day[i],&time[i], "%d,%d", &y[i],&z[i]) == 3)
and have declared new arrays
int year, int month, int day, int time
However, this approach gives garbage as the output. Any suggestions on how to modify this code to read and parse dates correctly?

You can use strtok to split a string by delimiter.
This describes the function. It includes a simple example.
Each call gives you a substring, ending where the delimiter was found. You can then copy that substring into a string in an array of strings (of max length, if you know the max length) (or you can have an array of pointers to strings, and allocate memory for each one before the copy to strlen(substring) + 1 (+1 for NULL terminator).
Note that strtok temporarily modifies the existing string, so you must either use it right away or copy it. If you just save a pointer to what strtok returns, when you finish the sequence of strtok calls the string will be restored to its original form and your "substrings" will not be what you expect. (See answer to this question for an explanation.)
And, please do not use !feof(myfile) to control the exit from your loop. See this post for an explanation. Since BradS gave an alternative in his comment, and it is not your main question, I won't repeat him here.
OK, looking at your sscanf approach and your question about dates:
I have used the various scanf and printf functions many times; I have never seen two different format strings in the same call, with parameters in the middle. From the function prototypes, this can't work. I think the only reason it compiles is because the variable parameter list in a variadic function does not have type checking. It will simply ignore these parameters: "%d,%d", &y[i],&z[i], because these are output parameters which do not correspond to anything in the format string. You can do something like:
if(sscanf(line, "%d-%d-%d-%d,%d,%d",&year[i],&month[i], &day[i],&time[i],&y[i],&z[i]) == 6)
This includes all the parameters in the format string.
Also, you mention creating arrays like this:
int year, int month, int day, int time
Those are not arrays. They are simple integers. You can declare an array like:
int year[MAX_SIZE];
I personally have found the scanf set of functions rather difficult to deal with; if the format string doesn't quite match the reality in the string you won't get what you expect/need. That is why I like strtok. But not everyone is comfortable with it; it is not the most obvious or intuitive interface.

Related

Traverse my data structure, and enter random values to it

I am having a very big data structure.
Only 1 structure.
Now, this structure has many sub-structures under it and so on.
I have to put random values to each variable of this structure.
I would have done it manually, but there are more than 10000 variables under it.
It's a long and deep structure, that have structure under structure.
for eg -> This is just an example, actual structure is very big
struct qwerty{
unsigned short catch;
unsigned short port;
MediaAuthType_e mediaAuth;
typeShortNatmr NAT;
typeDynEpDom domain;
typeRDomList domainlist;
typeDom domainSize;
};
Now each of these data types has substructure under it
eg for MediaAuthType_e data type above we have a structre as
struct MediaAuthType_e
{
int nunkhdr;
msg_body_list* unknown_msg_body;
int unknown_msg_body_count;
SipLssHandle Handle;
InfoEntry *dfo;
char* ua_uri;
char* accept;
void* s_contact;
char* branch;
char* chargeNum;
int 100Supported;
int 100Required;
};
and so on .
Can someone please help?
I just have to store random values to each of my variables?
Can I automate this process?
EDIT:
Why I am doing this is,
I have to encode the data to xdr format and decode it to get the same value

Following pseudo code will assign random values. However, the pointers will not be valid pointers! It just fills the whole memory area with sequential values.
unsigned long int i; // in case your structure is too Big!
struct MediaAuthType_e *my_MediaAuthType_e;
my_MediaAuthType_e = malloc(sizeof(struct MediaAuthType_e));
char *tmp = (char *)my_MediaAuthType_e;
for(i = 0; i < sizeof(struct MediaAuthType_e); i++)
{
*tmp = (i%255); // Assign some values at each byte, use your logic to assign random values.
tmp++;
}

printing struct array in lldb python

Following the question here: Writing a Python script to print out an array of recs in lldb
I would like to be able to create a type summary for an array of a given struct in lldb. Problem is that I am not able to access array correctly through python-lldb. Some data is incorrect.
I have the following test code in C:
#include <stdio.h>
#include <stdlib.h>
struct Buffer
{
struct Buffer* next;
struct Buffer* prev;
};
struct Base
{
struct Buffer* buffers;
int count;
};
void fill(struct Buffer* buf, int count)
{
for (int i = 0; i < count; ++i)
{
struct Buffer t = {(void*)0xdeadbeef,(void*)i};
buf[i] = t;
}
}
void foo(struct Base* base)
{
printf("break here\n");
}
int main(int argc, char** argv)
{
int c = 20;
void* buf = malloc(sizeof (struct Buffer) * c);
struct Base base = {.buffers = buf, .count = c};
fill(base.buffers, base.count);
foo(&base);
return 0;
}
In lldb:
(lldb) b foo
(lldb) r
(lldb) script
>>> debugger=lldb.debugger
>>> target=debugger.GetSelectedTarget()
>>> frame=lldb.frame
>>> base=frame.FindVariable('base')
>>> buffers=base.GetChildMemberWithName('buffers')
Now, buffers should point to array of struct Buffer and I should be able to access each and every Buffer via the buffers.GetChildAtIndex function, but the data is corrupted in the first 2 items.
>>> print buffers.GetChildAtIndex(0,0,1)
(Buffer *) next = 0x00000000deadbeef
>>> print buffers.GetChildAtIndex(1,0,1)
(Buffer *) prev = 0x0000000000000000
>>> print buffers.GetChildAtIndex(2,0,1)
(Buffer) [2] = {
next = 0x00000000deadbeef
prev = 0x0000000000000002
}
Only the buffers[2] and up items are ok.
Why does print buffers.GetChildAtIndex(1,0,1) points to buffers[0].count item instead of buffers[1]?
What am I doing wrong?

GetChildAtIndex is trying to be a little over-helpful for your purposes here. It is in accord with the help, which says:
Pointers differ depending on what they point to. If the pointer
points to a simple type, the child at index zero
is the only child value available, unless synthetic_allowed
is true, in which case the pointer will be used as an array
and can create 'synthetic' child values using positive or
negative indexes. If the pointer points to an aggregate type
(an array, class, union, struct), then the pointee is
transparently skipped and any children are going to be the indexes
of the child values within the aggregate type. For example if
we have a 'Point' type and we have a SBValue that contains a
pointer to a 'Point' type, then the child at index zero will be
the 'x' member, and the child at index 1 will be the 'y' member
(the child at index zero won't be a 'Point' instance).
So really, buffers.GetChildAtIndex(2,0,1) should have returned "No Value". Either that or passing 1 for the allow-synthetic argument should turn off this peek-through behavior. In either case, this is a bug, please file it with http://bugreporter.apple.com.
In the mean time you should be able to get the same effect by walking your array by hand and using "SBTarget.CreateValueFromAddress to create the values. Start by getting the address of the array with buffers.GetAddress(); and the size of Buffers by getting the type of buffers, getting its Pointee type & calling GetByteSize on that. Then just increment the address by the size count times to create all the values.

Python (rospy) to C++ (roscpp) struct.unpack

I am currently translating a rospy IMU-driver to roscpp and have difficulites figuring out what this piece of code does and how I can translate it.
def ReqConfiguration(self):
"""Ask for the current configuration of the MT device.
Assume the device is in Config state."""
try:
masterID, period, skipfactor, _, _, _, date, time, num, deviceID,\
length, mode, settings =\
struct.unpack('!IHHHHI8s8s32x32xHIHHI8x', config)
except struct.error:
raise MTException("could not parse configuration.")
conf = {'output-mode': mode,
'output-settings': settings,
'length': length,
'period': period,
'skipfactor': skipfactor,
'Master device ID': masterID,
'date': date,
'time': time,
'number of devices': num,
'device ID': deviceID}
return conf
I have to admit that I never ever worked with neither ros nor python before.
This is no 1:1 code from the source, I removed the lines I think I know what they do, but especially the try-block is what I don't understand. I would really appreciate help, because I am under great preasure of time.
If someone is curious(context reasons): The files I have to translate are mtdevice.py , mtnode.py and mtdef.py and can be found googleing for the filesnames + the keyword ROS IMU Driver
Thanks a lot in advance.

This piece of code unpacks the fields of a C structure, namely masterID, period, skipfactor, _, _, _, date, time, num, deviceID, length, mode, settings, stores those in a Python dictionary and returns that dictionary as call result. The underscores are placeholders for the parts of the struct that aren't used.
See also: https://docs.python.org/2/library/struct.html, e.g. for a description of the format string ('!IHHHHI8s8s32x32xHIHHI8x') that tells the unpack function what the struct looks like.
The syntax a, b, c, d = f () means that the function returns a thing called a tuple in Python. By assigning a tuple to multiple variables, it's split into its fields.
Example:
t = (1, 2, 3, 4)
a, b, c, d = t
# At this point a == 1, b == 2, c == 3, d == 4
To replace this piece of code by C++ should not be too hard, since C++ has structs much like C. So the simplest C++ implementation of requestConfiguration would be to just return that struct. If you want to stay closer to the Python functionality, your function could put the fields of the struct into a C++ STL map and return that. The format string + the docs that the link points to, tell you what data types are in your struct and where.
Note that it's the second parameter of unpack that holds your data, the first parameter just contains information on the layout (format) of the second parameter, as explained in the link. The second parameter looks to Python as if it's a string, but it's actually a C struct. The first parameter tells Python where to find what in that struct.
So if you read the docs on format strings, you can find out the layout of your second parameter (C struct). But maybe you don't need to. It depends on the caller of your function. It may just expect the plain C struct.
From your added comments I understand that there's more code in your function than you show. The fields of the structs are assigned to attributes of a class.
If you know the field names of your C struct (config) then you can assign them directly to the attributes of your C++ class.
// Pointer 'this' isn't needed but inserted for clarity
this->mode = config.mode;
this->settings = config.settings;
this->length = config.length;
I've assumed that the field names of the config struct are indeed mode, settings, length etc. but you'd have to verify that. Probably the layout of this struct is declared in some C header file (or in the docs).

To do the same thing with C++, you'd declare a struct with the various parameters:
struct DeviceRecord {
uint32_t masterId;
uint16_t period, skipfactor, _a, _b;
uint32_t _c;
char date[8];
char time[8];
char padding[64];
uint16_t num;
uint32_t deviceID;
uint16_t length, mode;
uint32_t settings;
char padding[8];
};
(It's possible this struct is already declared somewhere; it might also use "unsigned int" instead of "uint32_t" and "unsigned short" instead of "uint16_t", and _a, _b, _c would probably have real names.)
Once you have your struct, the question is how to get the data. That depends on where the data is. If it's in a file, you'd do something like this:
DeviceRecord rec; // An instance of the struct, whatever it's called
std::ifstream fin("yourfile.txt", std::ios::binary);
fin.read(reinterpret_cast<char*>(&rec), sizeof(rec));
// Now you can access rec.masterID etc
On the other hand, if it's somewhere in memory (ie, you have a char* or void* to it), then you just need to cast it:
void* data_source = get_data(...); // You'd get this from somewhere
DeviceRecord* rec_ptr = reinterpret_cast<DeviceRecord*>(stat_source);
// Now you can access rec_ptr->masterID etc
If you have a std::vector, you can easily get such a pointer:
std::vector<uint8_t> data_source = get_data(...); // As above
DeviceRecord* rec_ptr = reinterpret_cast<DeviceRecord*>(data_source.data());
// Now you can access rec_ptr->masterID etc, provided data_source remains in scope. You should probably also avoid modifying data_source.
There's one more issue here. The data you've received is in big-endian, but unless you have a PowerPC or other unusual processor, you're probably on a little-endian machine. So you need to do a little byte-swapping before you access the data. You can use the following function to do this.
template<typename Int>
Int swap_int(Int n) {
if(sizeof(Int) == 2) {
union {char c[2]; Int i;} swapper;
swapper.i = n;
std::swap(swapper.c[0], swapper.c[1]);
n = swapper.i;
} else if(sizeof(Int) == 4) {
union {char c[4]; Int i;} swapper;
swapper.i = n;
std::swap(swapper.c[0], swapper.c[3]);
std::swap(swapper.c[1], swapper.c[2]);
n = swapper.i;
}
return n;
}
These return the swapped value rather than changing it in-place, so now you'd access your data with something like swap_int(rec->num). NB: The above byte-swapping code is untested; I'll try compiling it a bit later and fix it if necessary.
Without more information, I can't give you a definitive way of doing this, but perhaps this will be enough to help you work it out on your own.

How to write an LLDB synthetic provider for a shaped view of std::vector data

I am trying to create LLDB visualizers for classes in my project. The LLDB documentation is... sparse. I have an array class that stores the underlying data in a std::vector and has an extent array to describe the shape. It can also be reshaped later.
By default, the std::vector "data_" is always shown as a linear vector. I would like my provider to create a view hierarchy. In this example, the first level would be the child rows, each row expanding to a list of column values. Similar to viewing a static 2D array (i.e. double[3][2]). You can imagine extending this to N dimensions.
I can't seem to figure out how to use the lldb python object model to impose hierarchical structure onto the linear buffer in std::vector.
Nothing seems to be documented, and I have been guessing in the dark for about a week. Here is a simplified example array class that I would like to create a visualizer for.
Any help is greatly appreciated!
#include <vector>
#include <cassert>
template <typename T>
class myarray {
int extent_[2];
std::vector<T> data_;
public:
myarray(int r, int c, const T* data) {
extent_[0] = r;
extent_[1] = c;
data_.resize(r * c);
for(size_t i = 0; i < data_.size(); ++i) data_[i] = data[i];
}
void reshape(int r, int c) {
assert(r * c == data_.size());
extent_[0] = r;
extent_[1] = c;
}
};
int main(int argc, const char * argv[])
{
double initdata[6] = { 0, 1, 2, 3, 4, 5 };
myarray<double> mydata(3, 2, initdata);
mydata.reshape(1, 6);
return 0;
}
As requested: The output I would like to see for the first [3][2] example might look like the following. The first level of 3 children are "rows", with a summary string of the leading elements in the row. The idea is to get a 2D view of the matrix data. Then when a row is expanded, it would be viewed as an array of column values.
LLDB potential synthetic output:
mydata
[0]
[0] = 0 <-- expanded contents
[1] = 1
[1] = {2, 3} <-- summary string of row contents. First N elements, then ...
[2] = {4, 5}
The synthetic provider examples for a simple vector implement get_child_at_index something like this, where I determined the count, value_size, and value_type in the update() method:
def get_child_at_index(self,index):
logger = lldb.formatters.Logger.Logger()
logger >> "get_child_at_index: " + str(index)
if index < 0: return None;
if index >= self.count: return None;
try:
offset = index * self.value_size
return self.data.CreateChildAtOffset('['+str(index)+']',offset,self.value_type)
except:
return None
I think I can easily work this out if I could just figure out how to create an SBType to use in place of value_type when calling CreateChildAtOffset. I think I could then lay down any kind of structure that I like. However, with many shots in the dark, I couldn't figure out how to create an SBType object successfully.
Ideas? Does anyone know how to create an SBType from a string that I compose?

I am assuming you have already looked over: http://lldb.llvm.org/varformats.html
IIUC, what you want to do is display the elements of the vector in a more hierarchical format.
It's kind of an interesting task, one for which you're probably going to have to craft your own data types - something for which I don't think we have a whole lot of support in our public API currently.
As a workaround, you can of course run an expression that generates the struct you care about and hold on to it - however that is going to be slow.
In your example, what exactly is the view you'd like to get? That kind of by example information can actually be helpful in figuring out more details.
EDIT: Currently LLDB doesn't let you create new types through the public API. What you can do to get your hands on an SBType of your own making is use the expression parser, as in this example:
x = lldb.frame.EvaluateExpression("struct foo { int x; }; foo myfoo = {12}; myfoo")
data = lldb.SBData.CreateDataFromSInt32Array(lldb.eByteOrderLittle,8,[24])
x_type = x.GetType()
myOtherFoo = x.CreateValueFromData("myOtherFoo",data,x_type)
print myOtherFoo
OUTPUT: (foo) myOtherFoo = (x = 24)
This is going to be fairly slow, especially if you don't cache the foo type you need (which from your example seems to be a T[2] for your template argument T) - but until LLDB has SB API to create types through clang (like we do internally), this is your only approach

Not sure if this will help, but you can find existing types
target = lldb.debugger.GetSelectedTarget()
type_list = target.FindTypes('base::Value')
if you want to create your child with an existing type, that may help.

Error when using ctypes module to acess a DLL written in C

I have a DLL with one single function That gets five doubles and one int:
__declspec(dllexport) struct res ITERATE(double z_r,double z_i,double c_r, double c_i, int iterations, double limit)
It retuns a custom struct caled res which consists of a three-double array:
struct res {
double arr[3];
};
To return the values I do this:
struct res result; /*earlier in the code */
result.arr[0] = z_real; /*Just three random doubles*/
result.arr[1] = z_imag;
result.arr[2] = value;
return result;
I've compiled it with MinGW and I'm trying to use it in python to do something like this:
form ctypes import *
z = [0.0,0.0]
c = [1.0,1.0]
M = 2.0
MiDLL = WinDLL("RECERCATOOLS.dll")
MiDLL.ITERATE.argtypes = [c_double, c_double, c_double, c_double,c_int,c_double]
MiDLL.ITERATE(z[0],z[1],c[0],c[1],100,M) #testing out before assigning the result to anything.
But, whenever I try call the function with those values, it wil throw this to me:
WindowsError: exception: access violation writing 0x00000000
I also don't know how to catch the custom structure I declared and convert each of it's elements into Python floating points. I've looked into this PyDocs link but to no avail.
Thank you in advance.
EDIT:
This is the original (modified according to suggestions) header used ("mydll.h"):
#ifndef MYDLL_H
#define MYDLL_H
extern "C" __declspec(dllexport)
#define EXPORT_DLL __declspec(dllexport)
EXPORT_DLL void ITERATE(struct res*, double z_r,double z_i,double c_r, double c_i, int iterations, double limit)
#endif
And, in case something might be wrong with it, the code file (it's very short, just one function):
#include <stdio.h>
#include <complex.h>
struct res {
double arr[3];
};
void __declspec(dllexport) ITERATE(struct res* result,double z_r,double z_i,double c_r, double c_i, int iterations, double limit)
{
/* The purpose of this function is, given two complex numbers,
an iteration number and a limit, apply a formula to these
two numbers for as many iterations as specified.
If at any iteration the result of the formula is bigger than
the limit, stop and return the number and the iteration it reached.
If after iterating they are still inside the limit, return the
number after all the iterations and the number of iterations
it has gone through.
Complex numbers are composed of a real part and an imaginary part,
and they must be returned separately.
*/
double complex z = z_r + z_i*I;
double complex c = c_r + c_i*I;
int actual_iter;
for (actual_iter = 1; actual_iter <= iterations; actual_iter++)
{
z = z*z + c;
if (cabs(z) > limit)
{
double value = actual_iter;
double z_real = creal(z);
double z_imag = cimag(z);
result.arr[0] = z_real;
result.arr[1] = z_imag;
result.arr[2] = value;
}
}
double value = iterations;
double z_real = creal(z);
double z_imag = cimag(z);
result.arr[0] = z_real;
result.arr[1] = z_imag;
result.arr[2] = value;
}
int main()
{
return 0;
}

There is a problem with returning structs like that. Not all compilers return such structures the same way. I'd rather change the function declaration to this:
void __declspec(dllexport) ITERATE(struct res* result, double z_r,double z_i,
double c_r, double c_i, int iterations, double limit);
That way the struct is in the user's memory, and there is no ambiguity on how the struct will be returned.
Of course, as David said, you may have to use a different calling convention.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.