Sunday, November 8, 2015

Week 2 assignment: Running Your First Program

Reminder from week 1
           I have chosen the AddHealth dataset. After looking through its codebook, I found that I am interested in studying the dependencies between Drug Abuse of teenagers and some factors that most likely affect or been affected by it.


Week 2

Primary data analysis through Python/SAS

The chosen language: Python

In this post:

  • The python code
  • The formatted output
  • Results summary and description
  • The python raw output


The python code:

# -*- coding: utf-8 -*-
"""
Created on Sun Nov  8 11:21:26 2015

@author: Dr. Mohammad Elnesr
"""

import pandas
#import numpy [NO NEED for NUMPY RIGHTNOW]
# defining data source...
data = pandas.read_csv('addhealth_pds.csv', low_memory=False)

# printing number of data rows (observations) and columns (variables)
print ('Number of data rows: ', len(data))
print('Number of data columns: ', len(data.columns))

# definig a python dictionary describing the meaning of each variable
dict={"H1WP10":"How much you think your mother cares about you?","H1RE4":"How important the religion is to you?","H1TO40":"How old were you when you tried illegal drugs?"}

# creating a loop that take each variable independetly
for variable in ["H1TO40","H1WP10","H1RE4"]:
    data[variable]=data[variable].convert_objects(convert_numeric = True)
    # define the frequency distribution 
    ct1 = data.groupby(variable).size()
    # define the frequency distribution percent
    pt1 = data.groupby(variable).size()*100/len(data)
    # printing results with definitions
    print ("***********************************************")
    print ("Analyzing variable: ", variable)
    print ("...answers the question: ", dict[variable])
    print (ct1)
    print (pt1)

The formatted output:


Results summary and description:

     It is noticed from H1TO40 variable that 90.76% of the studied sample never tried any illegal drugs, which is fairly good ratio. However, 9.24% of the students tried at least one type of the illegal drugs, most of them started this bad experience at the age of 14 to 16 yeas old. With a close ratio, 84.63% of the students felt that their mother cares about them very much as shown in Table H1WP10. The role of religion appears clearly in Table H1RE4 where 77.3% told that it is either very important or fairly important to them.

    The relationship between these three variables (and other variables) will be discussed in the next week.

The python raw output:

runfile('X:/Dropbox/@CurrentWork/Data Analysis/PythonWorkingDirectory/FirstProgram.py', wdir='X:/Dropbox/@CurrentWork/Data Analysis/PythonWorkingDirectory')

Number of data rows:  6504
Number of data columns:  2829
***********************************************
Analyzing variable:  H1TO40
...answers the question:  How old were you when you tried illegal drugs?
H1TO40
0     5903
1       15
3        6
6        4
9        2
11      12
12      33
13      61
14      85
15     108
16      96
17      64
18      13
96      60
98      36
99       4
dtype: int64
H1TO40
0     90.759533
1      0.230627
3      0.092251
6      0.061501
9      0.030750
11     0.184502
12     0.507380
13     0.937884
14     1.306888
15     1.660517
16     1.476015
17     0.984010
18     0.199877
96     0.922509
98     0.553506
99     0.061501
dtype: float64
***********************************************
Analyzing variable:  H1WP10
...answers the question:  How much you think your mother cares about you?
H1WP10
1      15
2      39
3     127
4     445
5    5504
6       1
7     370
8       3
dtype: int64
H1WP10
1     0.230627
2     0.599631
3     1.952645
4     6.841943
5    84.624846
6     0.015375
7     5.688807
8     0.046125
dtype: float64
***********************************************
Analyzing variable:  H1RE4
...answers the question:  How important the religion is to you?
H1RE4
1    2812
2    2218
3     391
4     193
6       3
7     879
8       8
dtype: int64
H1RE4
1    43.234932
2    34.102091
3     6.011685
4     2.967405
6     0.046125
7    13.514760
8     0.123001
dtype: float64

X:/PythonWorkingDirectory/FirstProgram.py:18: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.In [52]: 
  ct1 = data.groupby(variable).size()


No comments:

Post a Comment