PSPAR: Sparse Matrix Version of PSTAR

Andrew Seary

Simon Fraser University

15 - May - 1999

This is sparse documentation for the PSPAR, the sparse matrix version of p*

Download psparw32.zip This is the 32-bit Win95/NT version with sample data and output files

If you download these programs, please send a brief message to either one of us to let us know about your experience with them. Send mail to: Andrew Seary seary@sfu.ca or Bill Richards richards@sfu.ca

The programs on this page were updated on June 11, 1999. Previous versions read only the ID number of the sender and receiver of each link. They did not read a third number that describes the presence/absence (or strength) of the link. The previous versions assumed that the data file contained a list of the links that are present in the data.
I also updated adj2neg (available on the Utility Programs section) today. Previous versions could handle matrices with up to 400 rows/columns. They also created a line in the data file for each entry in the matrix, regardless of whether it was a 0, a 1, or something else. Data files created with the earlier version of adj2neg would thus include lines for pairs of nodes that had a 0 in the adjacency matrix. Although adj2neg put the number that was in row i, column j of the adjacency matrix after the ID numbers for node i and node j, there was a problem when previous versions of pspar were used to analyze this data. The problem is that previous versions of pspar did not check to see if the number after the ID numbers was non-zero. This means that the "0" elements in the adjacency matrix were treated the same as the "1" elements. The new version of adj2neg (the one available now) differs from the previous version in two ways: 1) it can handle networks with up to 2,000 rows/columns (previous versions only up to 400); 2) it asks you if you want it to include lines in the data file for "0" entries in the adjacency matrix. If you answer "n," it will only include lines for non-zero entries in the adjacency matrix. This is the option you should choose unless you want access to the "0" entries in your analysis. The new version of pspar (the ones available on this page) differs from the previous one in one important way: It reads three numbers from each line of data. The first two are ID numbers of the row ("from") and column nodes ("to"). The third number tells whether the link was present ("1" in the adjacency marix) or absent ("0" in the adjacency matrix). If the third number is "0", that line of data is ignored.
Vancouver |

PSPAR.EXE is a preliminary offering:

- it comes in two flavours:
- the 32-bit Win95/NT version can handle up to 2,000 nodes and 500,000 links;
- the 32-bit OS/2 version can handle up to 2,000 nodes and 500,000 links;

- both do stand-alone logistic regression;
- they fit to many of the statistics produced by PREPSTAR and more (not the p1 alphas and betas - these don't make sense for large networks). However, they DO fit to all 15 (non-trivial) triads, and to comparative networks;
- they produce output similar to that shown in the Connections article by Crouch and Wasserman;
**they do not require PREPSTAR or any other pre-processing programs or files****they do not require SPSS, SAS, BMDP, or any other analytic packages**

PSPAR was designed both

- to handle large networks, AND
- to make p* fitting easy to use

Both the Win95/NT and the OS/2 versions are 32-bit, compiled with GNU F77, and can deal with networks that have up to 2,000 nodes and 500,000 links.

**PSPAR is NOT a pre-processor**

PSPAR is *not* a program you run as a first step; it is *not* a program that you run to prepare your data for other analysis. With PSPAR, you do NOT use SAS, SPSS, or any other package that does the regression analysis.

**PSPAR does everything in one step. It does the complete logistic regression.**

PSPAR does not require or produce huge files. In particular, it does NOT use adjacency matrices. If your data is in adjacency matrix format, you can convert it to the required format by using ADJ2NEG.EXE, available at Bill Richards' web site in the "Utility Programs" section.

Network input files are NEGOPY-style link lists. There is one line of data in the file for each link. Each line of data contains the ID number of the "sender" and the ID number of the "receiver" and a "1" to indicate presence of a link. Examples are included in pspar.zip, with the extension .NEG

To see a sample .NEG file, click here.

For large sparse networks, this is a much more efficient representation (and much easier to create, check, and edit) than adjacency matrices. NEGOPY-style files are simplified versions of the .LNK files used by FATCAT and MultiNet.

Blocking is accomplished with attribute files which list node ID numbers and attributes.

There are example attribute files in pspar.exe, with extension .ATR (These are simplified versions of the .IND files used by FATCAT and MultiNet.)

Comparative networks can be any .NEG file

Sample output files are included in pspar.zip, with extension .OUT

**CURRENT RESTRICTIONS:**

1. Interaction with the program and error-handling are currently rudimentary.

2. For PSPAR, the .NEG files are assumed to be sorted by ID number.

like this not like this 1 2 5 5 1 3 5 2 1 7 2 4 1 9 1 3 2 1 2 3 2 4 2 1 2 6 1 7 2 8 1 2 3 4 3 5 3 5 1 9 : : : :

This is the format automatically produced by ADJ2NEG

3. Only integer data can be read from attribute and comparative files.

4. The following are current size limits:

- In Win95, NT, OS/2:
- 2,000 nodes maximum.
- 500,000 links maximum.

- 64 parameters maximum may be fit.
- 8 blocking attributes maximum. Each attribute may have 16 categories, so 16 x 16 blocks maximum.
- 16 block types maximum

Larger versions which have most of these restrictions removed are in the works. They will do more with p*, too. But first...

We WELCOME any questions or comments.

PLEASE send email to:

Andrew Seary seary@sfu.ca or Bill Richards richards@sfu.ca

This sparse documentation will be expanded along with the program.

Here is a sample run of the program, using the class4 data from the p* home page. To see a complete output file, click here.

---------------------------------------------------------- D:>pspar Sparse Matrix p* by Andrew Seary (March, 1999) Enter name of network file: class4.neg Include diagonal (y or n)? y Fit to block parameters (y or n)? y Enter name of attribute file: class4.atr How many attributes (not including id)? 1 Enter name of output file: class4.out Reading class4.neg .... Enter attribute number for blocking (1- 1): 1 1 0 0 1 Accept this block structure? (y or n): y Select from Edges: 1) i->j, REdges: 2) i<>j 2Stars: 3) k<-i->j, 4) k->i<-j, 5) k->i->j Triads: 6) i->j->k<-i, 7) i->j->k->i, R2Stars: 8) k<>i->j, 9) k<>i<-j, 10) k<>i<>j RTriads: 11) i<>j->k<-i, 12) i<>j<-k->i, 13) i<>j<-k<-i 14) i<>j<>k<-i, 15) i<>j<>k<>i Comparative network: 16) Add 100 for correponding block parameter How many parameters? 5 Enter parameter numbers: 1 101 2 102 6 Pass 1.. 2.. 3.. 4.. 5.. 6.. Final -2 Log Likelihood = 435.405 Goodness of Fit = 483.413 Model Chisquare = 363.101 df = 5 Fit % Correct Residuals Data 375 40 90.36 Absolute 138.0584 60 101 62.73 Squared 70.0653 Overall 82.64 Parameter Block b S.E. Wald exp(b) 1 -3.5750 .3021 140.0862 .0280 1 1 .4370 .3642 1.4394 1.5481 2 1.2794 .5225 5.9960 3.5944 2 1 .3397 .5983 .3223 1.4045 6 .2769 .0375 54.5833 1.3190 Continue? (y or n): y Same files? (y or n): y Same blocking? y Select from Edges: 1) i->j, REdges: 2) i<>j 2Stars: 3) k<-i->j, 4) k->i<-j, 5) k->i->j Triads: 6) i->j->k<-i, 7) i->j->k->i, R2Stars: 8) k<>i->j, 9) k<>i<-j, 10) k<>i<>j RTriads: 11) i<>j->k<-i, 12) i<>j<-k->i, 13) i<>j<-k<-i 14) i<>j<>k<-i, 15) i<>j<>k<>i Comparative network: 16) Add 100 for correponding block parameter How many parameters? : : -----------------------------------------

Here is another sample run, using the Vickers and Chan data from the Wasserman & Pattison paper.

----------------------------------------- Enter name of network file: vcga.neg Include diagonal (y or n)? n Fit to block parameters (y or n)? y Enter name of attribute file: vcga.atr How many attributes (not including id)? 1 Enter name of output file: vcga.out Reading vcga.neg .... Enter attribute number for blocking (1- 1): 1 1 0 0 1 Accept this block structure? (y or n): n Enter 2 rows, and 2 columns of block types between 0 and 16 Row 1: 1 3 Row 2: 4 2 Select from Edges: 1) i->j, REdges: 2) i<>j 2Stars: 3) k<-i->j, 4) k->i<-j, 5) k->i->j Triads: 6) i->j->k<-i, 7) i->j->k->i, R2Stars: 8) k<>i->j, 9) k<>i<-j, 10) k<>i<>j RTriads: 11) i<>j->k<-i, 12) i<>j<-k->i, 13) i<>j<-k<-i 14) i<>j<>k<-i, 15) i<>j<>k<>i Comparative network: 16) How many global parameters? 2 Enter parameter numbers: 2 6 Block structure: 1 3 4 2 Select parameter and number of blocks. 0 0 to quit. Parameter, number of Blocks: 1 4 Parameter, number of Blocks: 0 0 Pass 1.. 2.. 3.. 4.. 5.. Final -2 Log Likelihood = 752.992 Goodness of Fit = 776.118 Model Chisquare = 372.679 df = 6 Fit % Correct Residuals Data 359 94 79.25 Absolute 246.9875 98 261 72.70 Squared 124.0224 Overall 76.35 Parameter Block b S.E. Wald exp(b) 2 1.3265 .1960 45.7888 3.7678 6 .1319 .0125 111.7546 1.1410 1 1 -2.2206 .2737 65.8364 .1085 1 2 -3.1949 .3139 103.6289 .0410 1 3 -2.9501 .2834 108.3339 .0523 1 4 -4.3489 .3314 172.2432 .0129 Continue? (y or n): y Same files? (y or n): y Same blocking? y Select from Edges: 1) i->j, REdges: 2) i<>j 2Stars: 3) k<-i->j, 4) k->i<-j, 5) k->i->j Triads: 6) i->j->k<-i, 7) i->j->k->i, R2Stars: 8) k<>i->j, 9) k<>i<-j, 10) k<>i<>j RTriads: 11) i<>j->k<-i, 12) i<>j<-k->i, 13) i<>j<-k<-i 14) i<>j<>k<-i, 15) i<>j<>k<>i Comparative network: 16) How many global parameters? 3 Enter parameter numbers: 2 6 16 Block structure: 1 3 4 2 Select parameter and number of blocks. 0 0 to quit. Parameter, number of Blocks: 1 4 Parameter, number of Blocks: 0 0 Enter name of comparative file: vcww.neg Pass 1.. 2.. 3.. 4.. 5.. Final -2 Log Likelihood = 684.549 Goodness of Fit = 796.228 Model Chisquare = 441.122 df = 7 Fit % Correct Residuals Data 391 62 86.31 Absolute 220.0587 106 253 70.47 Squared 110.2376 Overall 79.31 Parameter Block b S.E. Wald exp(b) 2 1.2144 .2056 34.8769 3.3684 6 .1056 .0133 62.7295 1.1114 16 2.1955 .3020 52.8469 8.9848 1 1 -2.5192 .3036 68.8520 .0805 1 2 -3.0657 .3276 87.5713 .0466 1 3 -2.8777 .2971 93.8152 .0563 1 4 -3.7990 .3309 131.8378 .0224 Continue? (y or n): --------------------------------------------

Compare these results with W&P Table 6 (model 30) and Table 9 (model 35).

NOTE: The output files contain more information, including the covariance matrix.

For these two examples, look at class4.out and vcga.out, both included in pspar.zip.