I have chosen the AddHealth dataset. After looking through its codebook, I found that I am interested in studying the dependencies between Drug Abuse of teenagers and some factors that most likely affect or been affected by it.
Week 2
Primary data analysis through Python/SAS
The chosen language: Python
In this post:
- The python code
- The formatted output
- Results summary and description
- The python raw output
The python code:
# -*- coding: utf-8 -*-
"""
Created on Sun Nov 8 11:21:26 2015
@author: Dr. Mohammad Elnesr
"""
import pandas
#import numpy [NO NEED for NUMPY RIGHTNOW]
# defining data source...
data = pandas.read_csv('addhealth_pds.csv', low_memory=False)
# printing number of data rows (observations) and columns (variables)
print ('Number of data rows: ', len(data))
print('Number of data columns: ', len(data.columns))
# definig a python dictionary describing the meaning of each variable
dict={"H1WP10":"How much you think your mother cares about you?","H1RE4":"How important the religion is to you?","H1TO40":"How old were you when you tried illegal drugs?"}
# creating a loop that take each variable independetly
for variable in ["H1TO40","H1WP10","H1RE4"]:
data[variable]=data[variable].convert_objects(convert_numeric = True)
# define the frequency distribution
ct1 = data.groupby(variable).size()
# define the frequency distribution percent
pt1 = data.groupby(variable).size()*100/len(data)
# printing results with definitions
print ("***********************************************")
print ("Analyzing variable: ", variable)
print ("...answers the question: ", dict[variable])
print (ct1)
print (pt1)
The formatted output:
Results summary and description:
It is noticed from H1TO40 variable that 90.76% of the studied sample never tried any illegal drugs, which is fairly good ratio. However, 9.24% of the students tried at least one type of the illegal drugs, most of them started this bad experience at the age of 14 to 16 yeas old. With a close ratio, 84.63% of the students felt that their mother cares about them very much as shown in Table H1WP10. The role of religion appears clearly in Table H1RE4 where 77.3% told that it is either very important or fairly important to them.
The relationship between these three variables (and other variables) will be discussed in the next week.
The python raw output:
runfile('X:/Dropbox/@CurrentWork/Data Analysis/PythonWorkingDirectory/FirstProgram.py', wdir='X:/Dropbox/@CurrentWork/Data Analysis/PythonWorkingDirectory')
Number of data rows: 6504
Number of data columns: 2829
***********************************************
Analyzing variable: H1TO40
...answers the question: How old were you when you tried illegal drugs?
H1TO40
0 5903
1 15
3 6
6 4
9 2
11 12
12 33
13 61
14 85
15 108
16 96
17 64
18 13
96 60
98 36
99 4
dtype: int64
H1TO40
0 90.759533
1 0.230627
3 0.092251
6 0.061501
9 0.030750
11 0.184502
12 0.507380
13 0.937884
14 1.306888
15 1.660517
16 1.476015
17 0.984010
18 0.199877
96 0.922509
98 0.553506
99 0.061501
dtype: float64
***********************************************
Analyzing variable: H1WP10
...answers the question: How much you think your mother cares about you?
H1WP10
1 15
2 39
3 127
4 445
5 5504
6 1
7 370
8 3
dtype: int64
H1WP10
1 0.230627
2 0.599631
3 1.952645
4 6.841943
5 84.624846
6 0.015375
7 5.688807
8 0.046125
dtype: float64
***********************************************
Analyzing variable: H1RE4
...answers the question: How important the religion is to you?
H1RE4
1 2812
2 2218
3 391
4 193
6 3
7 879
8 8
dtype: int64
H1RE4
1 43.234932
2 34.102091
3 6.011685
4 2.967405
6 0.046125
7 13.514760
8 0.123001
dtype: float64
X:/PythonWorkingDirectory/FirstProgram.py:18: FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.In [52]:
ct1 = data.groupby(variable).size()

No comments:
Post a Comment